I 


OT AND RIEDt 

t:.edgment 

ipport by the ONR, Grant N00014-90J.1026. 


iRENCES 

f-similar random multifractals, Adti. in Math, to 

riere. On the mnltifractal analysis of measures, 

Itifractal decompositions of Moran fractals, AAv. 

;io. The dimension spectnrm of some dynamical 
609-644. 

pectnim of statistically self-similar measures, /. 

ertaines martingales dc Benoit Mandelbrot, Adi^. 

ictal measures and a weak separation condition, 
publication. 

fractal measures with negative (latent) values for 
s’ Physical Origin and Properties” (L. Pietronero, 
, 1989. 

■’ multiplicative multifractals; Left-sided /(n) and 
'68(1990), 95-111. 

sions and Holders, multifractals and their Holder 
■reasymptotics in science, Fourier Anal. Appl. 2 

maph directed self-similar multifraetals, in “Pitt- 
Vol, 307, 1994. 

umerical estimates of generalized dimensions 
ren. 29 (1996), L 391-398. 

ctal analysis of equilibrium measures for confor- 
Moran geometric constructions, preprint, Penn 

ctrum /(a) for cookie-cutters, Ergodic Theory 
41. 

-■tal formalism and self-similar measures, J. Math. 

lultifractal scaling exponents, forthcoming, 
t. Multifractal formalism for infinite multinomial 
)95X 132-150. 

, Inversion formula for continuous multifractals, 

It, Exceptions to the multifractal formalism for 
>c. Comhr. Phil. Soc., to appear. 


1 
11 


1 


8494 UC994 59 

ADV APPL MATH 97 

tCJACADEMIC PRESS INC 


FREE 


GA 




ADVANCES IN APPLIED MATHEMATICS 18, 59-liO (1997) 
ARTICLE NO. AM960501 


From Association to Causation via Regression* 

David Freedman 

Statistics Department, University of California, Berkeley, California 94720 
Received October 3,1995; revised September 1996 


For nearly a century, investigators in the social sciences have used regression 
models to deduce cause-and-effect relationships from patterns of association. Path 
models and automated search procedures are more recent developments. In my 
view, this enterprise has not been successful. The models tend to neglect the 
difficulties in establishing causal relations, and the mathematical complexities tend 
to obscure rather than clarify the assumptions on which the analysis is based. 
Formal statistical inference is, by its nature, conditional. If maintained hypotheses 
A, B, C,... hold, then H can be tested against the data. However, if A, B, C,... 
remain in doubt, so must inferences about H. Careful scrutiny of maintained 
hypotheses should therefore be a critical part of empirical work—a principle 
honored more often in the breach than the observance. This paper focuses on 
modeling techniques that seem to convert association into causation. The object is 
to clarify the differences among the various uses of regression, as well as the source 
of the difficulty in making causal inferences by modeling. The discussion will 
proceed mainly by examples, ranging from Yule (X R. Stat. Sac. 62 (1899), 
249-295) to Spirtes, Glymour, and Schemes (“Causation," Lect. Notes in Statist, 
Vol. 81, Springer-Verlag, New York/Beclin, 1993). © Acidemiii.Pi-css 


1. OUTLINE 

Many treatments of regression seem to take for granted that the 
investigator knows the relevant variables, their causal order, and the 
functional form of the relationships among them; measurements of 
the independent variables are assumed to be without error. Indeed, Gauss 
developed and used regression in physical science contexts where these 
conditions hold, at least to a very good approximation.' Today, the text¬ 
book theorems that justify regression are proved on the basis of such 
assumptions. 

* Presented at the Notre Dame Conference on Causality in Crisis, Oct. 15-17, 1993. 

' Gauss was fitting orbits to astronomical observations, with least squares to estimate the 
elements of the orbits [21]. Stigler [64, pp- 145-146] awards priority to Legendre [36]. 


THIS ARTICLE IS FOR INOIVIDUAL USE DNLY 
AND HAY NOT BE FURTHER REPRODUCED OR 
STORED ELECTRONICALLY WITHOUT WRITTEN 
PERMISSION from THE COPYRIGHT HOLDER. 
UNAUTHORIZED REPRODUCTION MAY RESULT 
nv einnttarni. ako ortteK penal Ties. 


59 


0196-8858/97 $25.00 

f Lipyriglil'C> IQ97 by Acniicmic PrcsiS 
All righis of rcpmducUim in any form icservcd. 


PM3006509533 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 















60 


DAVID FREEDMAN 


REGRI 


In the social sciences, the situation seems quite different. Regression is 
used to discover relationships or to disentangle cause and effect. However, 
investigators have only vague ideas as to the relevant variables and their 
causal order; functional forms are chosen on the basis of convenience or 
familiarity; serious problems of measurement are often encountered. 

Regression may offer useful ways of summarizing the data and making 
predictions. Investigators may be able to use summaries and predictions to 
draw substantive conclusions. However, I see no cases in which regression 
equations, let alone the more complex methods, have succeeded as engines 
for discovering causal relationships. Of course, there may be success 
stories that I have not found; nor does a track record of failure necessarily 
project into the future. 

One of the first applications of regression techniques to social science is 
Yule [71], Recent examples will be found in Spirtes, Glymour, and Scheines 
[62], to be cited here as SGS. (The SGS theory is summarized in Glymour 
[23], cited as CG.) SGS have attracted considerable attention in the 
philosophy of science, because they have developed computerized algo¬ 
rithms that search for path models. With their algorithms, SGS claim to 
make rigorous inferences of causation from association. This is a bold 
claim, which does not survive examination. 

The balance of this paper is organized as follows. Section 2 discusses 
Yule’s work. Sections 3 and 4 explain the critical data of “exogeneity.” 
Section 5 describes a contemporary regression model. Sections 6-10 re¬ 
view SGS and reanalyze some of their examples. Sections 11-12 canvass 
some mathematical issues. Possible responses to my critique will be found 
in Section 13. There is a brief review of the literature in Section 14, and 
conclusions are presented in Section 15. For ease of reference, standard 
formulas for regression are given in an appendix. I have tried to make 
most of the paper accessible to nonstatistical readers, particularly if they 
will permit the occasional undefined technical term; Sections 11 and 12 are 
more specialized. 


2. YULE’S REGRESSION MODEL FOR PAUPERISM 

One of the first regression models in social science was developed by 
Yule—“An Investigation into the Causes of Changes in Pauperism in 
England, Chiefly During the Last Two Intercensal Decades.”’ In late 19th 
century England, poor people could be supported either inside the poor 
house or outside. Did provision of support outside the poor house increase 
the number of poor people? 

^ See [71; 64, pp. 34.S-3S8; 11]. 



To addres.s this issue. Yule used d 
and 1891. (In England, the census is 
considered the periods 1871—1881 an 
number of paupers to changes in th 
between the number of paupers su[ 
inside. He used regression to control 
population and its age structure. 

His equation can be written as foil 

APaiip = a + b y. ^Out -1- c 

Here, A stands for percentage diffe 
pers. Out for the outrelief ratio, Po/ 
proportion of people aged 65 and ov 

Yule’s unit of analysis was the “i 
small geographical area like a count; 
mixed, urban, metropolitan. He usee 
estimate the coefficients from the d: 
to do the arithmetic. 

To be more specific, Yule cstin 
time period (1871-81 and 1881-91) 
time periods and 4 kinds of areas, 
time period, all areas of the same kin 
governed by one equation, (By coinci 
equation, and 4 kinds of areas.) 

Yule was looking for the “Hook 
experiment, with lots of variation > 
analyzed the results. Regression was 
ing effects of change in population a 
held to show that, other things bcint 
cretite corresponding changes in tlu 
increase the outrelief ratio by one 
factors constant, you will increase lli 
being the coefficient of A (hit in 
positive, welfare creates paupers. 

Fi>r a moment. 1 turn from Yule h 
like (1) is usually written as 

y = .\ 

In this equtuion, the vector Y rep 
pauperism; the matrix A' represent 

’ Tlicre WL-rc about null such areas in l2ni;la 
parishes comhinctl for administrative purpose 


PM3006509534 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 






•-h- R-FFTT KrAF 

r’-OiTTeems quite different. Regression is 
Ojisentangle cause and effect. However, 
jorsrfo^tJre relevant variables and their 
^yiiosen x)n the basis of convenience or 
rrsnrement are often encountered. 

summarizing the data and making 
heJoaise summaries and predictions to 
I see no cases in which regression 
2if‘.5Ciiiethods, have succeeded as engines 
Of course, there may be success 
"ies a track record of failure necessarily 

yrrtastoirt^hniqucs to social science is 
lidird in Spirtes, Giymour, and Scheines 
-jtJS theory is summarized in Giymour 
.■iacjsd. .considerable attention in the 
wf have developed computerized algo- 
-u With their algorithms, SGS claim to 
jipm from association. This is a bold 
■m-ahanr' ’ ' 

■;;jtnTzed as follows. Section 2 discusses 
il3.m_.the critical data of “exogeneity.” 
TTegressIon^niodcL Sections 6-10 re- 
.■jW*'ex8[mples. Sections 11-12 canvass 
'»p'onses to my critique will be found 
xv of the literature in Section 14, and 
fl!LJ,5,_BDr ease of reference, standard 
mt an appendix. I have tried to make 
.rstalistical readers, particularly if they 
Ltechnical-term; Sections 11 and 12 are 


■vODIfEL FOR PAUPERISM 

jS-in social science was developed by 
.auses of Changes in Pauperism in 
iFtntercensal Decades.”^ In late 19th 
:76er supported cither inside the poor 
Apijort outside the poor house increase 


I 

I 



I 

I 

I 

I 

I 

1 

1 

1 

1 

J 

1 

■1 

_1 


■T1 

1 


REGRESSION 61 

To address this issue, Yule used data from the censuses of 1871, 1881, 
and 1891. (In England, the census is taken in years that end with 1.) He 
considered the periods 1871-1881 and 1881-1891, relating changes in the 
number of paupers to changes in the “outrelief ratio,” that is, the ratio 
between the number of paupers supported outside the poor house and 
inside. He used regression to control for two confounders—changes in the 
population and its age structure. 

His equation can be written as follows: 

APaup = a + b X AOut + c X APop + d X AOld + error. (1) 

Here, A stands for percentage difference, Paup for the number of pau¬ 
pers, Out for the outrelief ratio, Pop for population size, and Old for the 
proportion of people aged 65 and over. 

Yule’s unit of analysis was the “union,” which seems to have been a 
small geographical area like a county.^ He had four kinds of areas: rural, 
mixed, urban, metropolitan. He used “Ordinary Least Squares” (OLS) to 
estimate the coefficients from the data, with a “50 cm. Gravet” slide rule 
to do the arithmetic. 

To be more specific. Yule estimated a separate equation for each 
time period (1871-81 and 1881-91) and each kind of area. There were 2 
time periods and 4 kinds of areas, thus, 2X4 = 8 equations. Within a 
time period, ail areas of the same kind—for instance, all rural unions—are 
governed by one equation. (By coincidence, there are 4 coefficients in each 
equation, and 4 kinds of areas.) 

Yuie was looking for the “Hooke’s Law of Poverty.” Nature ran an 
experiment, with lots of variation over time and geography, and Yule 
analyzed the results. Regression was needed to control for the confound¬ 
ing effects of change in population and age structure. The equations were 
held to show that, other things being equal, changes in the outrelief ratio 
create corresponding changes in the number of paupers. Indeed, if you 
increase the outrelief ratio by one percentage point but hold the other 
factors constant, you will increase the number of paupers by b percent, b 
being the coefficient of A Out in Eq. (1). More qualitatively, if b is 
positive, welfare creates paupers. 

For a moment, I turn from Yule to methodology. A regression equation 
like (1) is usually written as 

Y=X^+s. (2) 

In this equation, the vector Y represents the dependent variable, like 
pauperism; the .matrix X represents-the e.xplanatory-j(or—independent”) 

^ There were about 600 such areas in England. A poor-law union ‘^consisted of two or more 
parishes combined for administrative purposes/’ [64, p. 346]. 


PM3006509535 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 







62 DAVID FREEDMAN 

variables, like the outrelief ratio, population, and age structure. These are 
observable. The vector jS represents parameters, which are not observable 
but may be estimated from the data; parameters are “social constants,” 
which characterize the process that generated the data. In Yule’s equation, 
/3 has four components—the parameters a, b,c,d in Eq. (1). The error or 
“disturbance” term s is also unobservable and represents the impact of 
chance factors unrelated to X. Statistical inferences are often based on 
“stochastic assumptions” about s; e.g., s is independent of X and its 
components are independent and identically distributed with mean 0. For 
details, see the Appendix. 

Three possible uses for regression equations are 

(i) to summarhe data, or 

(ii) to predict values of the dependent variable, or 

(iii) to predict the results of interventions. 

Yule could Certainly have summarized his data by saying that for a given 
time period and unions of a specific type, with certain values of the 
explanatory variables, the change in pauperism was about so much and so 
much. In other words, he could have used his equations to estimate the 
average value of Y, given the values of X. This use of regression may run 
into technical problems if there are outliers, or nonlinearities in the 
regression surface. However, at least in principle, there do seem to be 
technical fixes for such problems. Furthermore, stochastic assumptions 
about the disturbance term play almost no role. Therefore, like most 
statisticians, I believe that regression can be quite helpful in summarizing 
large data sets. 

For prediction, there is a ceteris paribus assumption: the system will 
remain stable. Prediction is already more complicated than description. On 
the other hand, if you make a series of predictions and test them against 
data, it may be possible to show that the system is stable, or sufficiently 
stable for regression to be quite helpfuL* Again, any particular use of 
regression to make predictions may go off the rails, but there do not seem 
to be essential difficulties of principle involved. 

Causal inference is different, because a change in the system is contem¬ 
plated; for example, there will be an intervention. Descriptive statistics tell 
you about the correlations that happen to hold in the data; causal models 
claim to tell you what will happen to Y if you change X. Indeed, 
regression is often used to make counterfactual inferences about the past: 
what would Y have been if X had been different? This use of regression 


Meehl [41] provides some well-known examples. Predictive validity is best demonstrated 
by making real ex ante forecasts in several different contexts: see Ehrenberg and Bound I13J. 


I 

I 

I 


i 

i 

E 

r 

t 

r 

t 



r 

L 


j 



1 

1 

] 

1 


REGRi 

to make causal inferences is the mo: 
atic. Difficulties are created by on 
form, etc. Of course, if the results 
frequency checked against the rest 
argument might be very different.^ 
For description and prediction, tl 
coefficients fade into the backgroun 
on the right-hand side of the equatit 
it is the individual coefficients that <. 
is b that should tell you what happ 
ratio is manipulated. 

At this remove, the flaws in Yu 
example, there seem to be some i 
equation, including variables that 
Yule’s comment on the last-named i 

A good deal of time and labour was .s 
results proved unsatisfactory, and fii 
gether. 

Yule [71] seems to have used the rai 
{!)—as a proxy for economic acth 
Other things being equal, populal 
number of paupers; in its role as prt. 
reduce pauperism. 

The equations for metropolitan u 
and 1881-1891:“ 

(1871-1881) 

APawp = 13.19 + 0.755 
- 0.022 X A 

(1881-1891) 

A Pm/p = 1.36 + 0.324 
-h 1.37 X At 

For example, one metropolitan i 
1871-1881, the percentage changes 

' .Also see Manski [40], 

'■ These. :uui the oilier six equations, are i 
XIX givL's diitn E«ir inctropoliUn unions, in 
nulicr thiui difkTunccs. apparently to avnkl 
ilata, the numcrifal coetticicnls in the displ 
tors in tl): the residuals arc obseiA’iible, 
Uislurtiancc icnns. 


PM3006509536 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 









lEBDMAN 


regression 


63 


iulation, and age structure. These are 
parameters, which are not observable 
i; parameters are “social constants,” 
cncrated the data. In Yule’s equation, 
:ters a, b,c, d in Eq. (1). The error or 
.-rvable and represents the impact of 
istical inferences are often based on 
i.g., s is independent of X and its 
•ntically distributed with mean 0. For 

equations are 


pendent variable, or 
-crventions. 

zed his data by saying that for a given 
fic type, with certain values of the 
pauperism was about so much and so 
j used his equations to estimate the 
of X. This use of regression may run 
'c outliers, or nonlinearities in the 
■t in principle, there do seem to be 
Furthermore, stochastic assumptions 
most no role. Therefore, like most 
can be quite helpful in summarizing 

paribus assumption: the system will 
ore complicated than description. On 
of predictions and test them against 
t the system is stable, or sufficiently 
rlpful.'* Again, any particular use of 
o off the rails, but there do not seem 
j involved. 

ise a change in the system is contem- 
ntervention. Descriptive statistics tell 
jn to hold in the data; causal models 
u to y if you change X. Indeed, 
iterfactual inferences about the past: 
een different? This use of regression 


tiples. Predictive validity is best demonstrated 
orent contexts; see Ehrenberg and Bound [13]. 


I 


G 




r? 


e 

o 



I 

I 

I 

r 

r 

f 


j 




to make causal inferences is the most intriguing—and the most problem¬ 
atic. Difficulties are created by omitted variables, incorrect functional 
form, etc. Of course, if the results of causal modeling were with any 
frequency checked against the results of interventions, the balance of 
argument might be very different.^ 

For description and prediction, the numerical values of the individual 
coefficients fade into the background; it is the whole linear combination 
on the right-hand side of the equation that matters. For causal inference, 
it is the individual coefficients that do the trick. In Eq. (1), for example, it 
is b that should tell you what happens to pauperism when the outrelief 
ratio is manipulated. 

At this remove, the flaws in Yule’s argument may be apparent. For 
example, there seem to be some important variables missing from the 
equation, including variables that measure economic activity. Here is 
Yule’s comment on the last-named factor [71, p. 253]: 

A good deal of lime and labour was spent in making trial of this idea, but the 
results proved unsatisfactory, and finally the measure was abandoned alto¬ 
gether. 

Yule [71] seems to have used the rate of population growth—A Pop in Eq. 
(1)—as a proxy for economic activity, although that creates ambiguity. 
Other things being equal, population growth will by itself add to the 
number of paupers; in its role as projqf, however, population growth should 
reduce pauperism. 

The equations for metropolitan unions are shown below, for 1871-1881 
and 1881-1891:^ 

(1871-1881) 

^Pallp = 13.19 + 0.755 X AOiit - 0.322 X APop 
— 0.022 X AOld + residual. 

(1881-1891) 

APfliip = 1.36 -1- 0.324 X ^Out - 0.369 X APop 
H- 1.37 X AOW + residual. 

For example, one metropolitan union is Westminster. Over the period 
1871-1881, the percentage changes in Out, Pop, and Old are —73, —9, 

^ Also see Manski [40], 

* These, and the other six equations, are reported in Yule [71, Table C, p. 259], His Table 
XIX gives data for metropolitan unions, in the form of “percentage ratios" for 1871-1881 
rather than differences, apparently to avoid negative numbers. The equations were fitted to 
data; the numerical coefficients in the displays are estimates for the corresponding parame¬ 
ters in (1); the residuals are observable, but arc only approximations to unobservable 
disturbance terms. 


t 


PM3006509537 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 








64 DAVID FREEDMAN 

and 5, respectively. The percentage change in Paup predicted from the 
regression equation is 

13.19 + 0.755 X (-73) - 0.322 X ( -9) - 0.022 X 5 = -39. 

The actual percentage change in Paup is — 48. The “residual” is 

residual = actual — predicted = — 48 — (“39) = —9. 

The coefficients in the regression equation are estimated so as to minimize 
the size of the residuals. (Technically, it is the sum of the squares that is 
minimized—hence the term “least squares.”) The linear combination of 
explanatory variables on the right side of the equation has therefore been 
optimized; but there is no guarantee that individual coefficients will make 
much sense. 

There are some noticeable inconsistencies in Yule’s coefficients, over 
time and across the various kinds of geography. Nor are the signs of the 
coefficients entirely reasonable. These inconsistencies may not by them¬ 
selves be fatal, but they certainly raise the question of whether the 
equations hold true for any well-defined population of times and places. If 
the coefficients do not have a life of their own—outside Yule’s particular 
data set—they cannot be used to answer questions of the form, “What 
would happen if you change the outrelief ratio?” The coefficients may be 
useful for descriptive purposes, but not for causal inference or even 
prediction. 

Moreover, there are familiar difficulties of interpretation. At best, Yule 
showed that changes in pauperism and the outrelief ratio were associated, 
even after adjusting for changes in the population and its age structure. 
The direction of the causal arrow, however, is by no means clear. Yule’s 
theory is that outrelief is the cause and pauperism is the effect. That is a 
reasonable view. However, the opposite idea seems equally tenable—a 
union that is flooded with paupers may not be able to build poor houses 
fast enough and resorts to outrelief. If so, pauperism causes outrelief. 
Also, Governor Pete Wilson’s theory may have some plausibility for 19 th 
century England if not 20th century California: unions that provide gener¬ 
ous Outrelief attract paupers from elsewhere.^ 

Yule must have been aware of these problems. After allocating the 
changes in pauperism to their various causes (including the residual), he 

’ According to Stigler [64, pp. 356-357}, Pigou criticized Yule for ignoring “the non- 
quantitative facts Of the situation.... It is well known that, during recent years, those unions 
in which out-relief has been restricted have, on the whole, enjoyed a general administration 
much superior to that of other unions." Stigler responds that “Pigou's ad hoc 
speculation... could not, of course, be disproved from the data Yule used.” In effect, this 
allows Yule to defend himself by pleading ignorance. 


{ 

I 

I 

I 

I 

I 

I 


t 

r 

r 

r 

r 

r 

r 


a 

3 

Q 

a 

3 

3 

3 

3 

1 

1 

1 

I 

I 


REGRi 


AOut AF 



AF 


Fig. 1. Yule’s model for pauperism. The 1 
asterisks denote a high degree of statistici 
recomputed Yule’s regression for the mcliop 
data in his Table XIX. I replicated his coefflei 
error is quite large; 

iPaup = 12.884 + 0.752 X AOut - 0.. 

10.367 0.135 0.1 

1.24 5.57 - 4.. 

Under the coefficients are standard errors U 
size of the difference between an estimated c. 
ratio of an estimate to its SE. Generally, a 1- 
that the corrc.sponding partimeter is unlikely 
model, and the SEs are computed on the basi 
details, see the appendbe. In Fig- 1, the explai 
are often signaled by curved, doubie-headed 


withdraws all causal claims with on 
read ‘associttfcd with.’” [71, p. 271 
modern in spirit, with two excep 
significance, and he did not use a j 


3. REGRESSION 
CONDITIONAl 

In the regression model (2), Y is 
.V represents the explanatory varial 
and age siruelure. If all goes well, ll 
“ettndiiional expectation” of Y give 
corrc.sponding to given valuc.s for t 
To ehtrify the definitions, consid- 

Procedure I. Select .subjects v 

Y’s. 


PM3006509538 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 



















FREEDMAN 

jc change in Paup predicted from the 


322 X ( -9) - 0.022 X 5 = -39. 

'’aup is —48. The “residual” is 

icted= -48 - (-39) = -9. 

quation are estimated so as to minimize 
lily, it is the sum of the squares that is 
t squares.”) The linear combination of 
side of the equation has therefore been 
ec that individual coefficients will make 

)nsistencies in Yule’s coefficients, over 
of geography. Nor are the signs of the 
hese inconsistencies may not by them- 
ly raise the question of whether the 
jfined population of times and places. If 
of their own—outside Yule’s particular 
I answer questions of the form, “What 
utrelief ratio?” The coefficients may be 
but not for causal inference or even 

flculties of interpretation. At best. Yule 
and the outrelief ratio were associated, 
n the population and its age structure. 
, however, is by no means clear. Yule’s 
e and pauperism is the effect. That is a 
pposite idea seems equally tenable—a 
s may not be able to build poor houses 
lief. If so, pauperism causes outrelief. 
ory may have some plausibility for 19th 
■y California: unions that provide gener- 
elsewhere,^ 

f these problems. After allocating the 
ious causes (including the residual), he 

7], Pigou criticized Yule for ignoring “the rton- 
well known that^ during recent years, those unions 
e, on the whole, enjoyed a general administration 
ins.” Stigler responds that “Pigou’s ad hoc 
.proved from the data Yule used.” In effect, this 
ignorance. 


i 

I 

I 


S 

I 




REGRESSION 65 

AOul APop AOld 



Fig. 1. Yule’s model for pauperism. The figure represents Eq. (1) in graphical form. The 
asterisks denote a high degree of statistical significance. To determine the asterisks, I 
recomputed Yule’s regression for the metropolitan unions over the period 1871-ISSl, using 
data in his Table XIX. I replicated his coefficients, as shown in the display, although roundoff 
error is quite large: 

APaup =12.884 + 0.752 X AOut - 0.311 X APop + 0.056 X AOld + residuat, 

10.367 0.135 0.067 0.223 

:.24 5.57 - 4.645 0.25 

Under the coefficients are standard errors (SBs) and t-statistics. The SE indicates the likely 
size of the difference between an estimated coefficient and its true value. The r-staristic is the 
ratio of an estimate to its SE. Generally, a t-statistic above 2 or 3 in absolute value indicates 
that the corresponding parameter is unlikely to be truly 0. The parameters are features of the 
model, and the SEs are computed on the basis of the stochastic assumptions in the niodei; for 
details, see the appendix. In Fig. 1, the explanatory variables are correlated; such correlations 
are often signaled by curved, double-headed arrows; error terms are not shown either. 


withdraws all causal claims with one deft sentence; “Strictly, for ‘due to’ 
read ‘associated with.’” [71, p. 270, footnote 25]. Yule’s paper is quite 
modern in spirit, with two exceptions: he did not rely on statistical 
significance, and he did not use a graph. Figure 1 brings him up to date. 


3. REGRESSION ESTIMATES AND 
CONDITIONAL EXPECTATIONS 

In the regression model (2), Y is the dependent variable, like pauperism; 
X represents the explanatory variables, like the outrelief ratio, population, 
and age structure. If all goes well, the regression equation will estimate the 
“conditional expectation” of Y given X = x, that is, the average value of Y 
corresponding to given values for the explanatory variables. 

To clarify the definitions, consider two procedures; 

Procedure 1. Select subjects with X = x\ look at the average of their 

Y’s. 


L 


n 


PM3006509539 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 











66 


DAVID FREEDMAN 


REGRI 


Procedure 2. Intervene and set = a: for some subjects; look at the 
average of their F’s. 

These procedures are quite different. The first involves the data set as you 
find it. The second involves an intervention. 

Regression does seem to let you move from selection to intervention; 
that is why the technique is so popular. However, regression approximates 
the selection procedure, rather than intervention. Nor does the statistical 
analysis prove that the two procedures give the same results; how could it? 
Instead, causal inferences are made by assuming that selection tells you 
what would happen if you were to intervene. 

The phrase “X is exogenous” is often taken to mean that selecting on X 
will produce the same results as intervening to set the value of X —the 
basic assumption in many analyses. Exogeneity also has weaker meanings, 
to be taken up later. The ambiguity is unfortunate, because analysts may 
assume exogeneity in a weak sense and proceed as if they had established 
something more. It is only exogeneity in the strong sense defined above 
that enables you to predict the results of interventions from nonexperi- 
mental data. 

The distinction between selection and intervention is acknowledged 
even by the modelers (Pearl [44, p. 396]): 

Formally speaking, probabilistic analysis is indeed sensitive only to covariations, 
so it can never distinguish genuine causal dependencies from spurious correla¬ 
tions .... 

Such admissions—like Yule’s [71] footnote 25—are fatal to the enterprise. 
Of course, Pearl does not give up. For instance, he goes on to say that 
experiments just provide the opportunity to observe yet more correlations, 
a move he attributes to Simon [59]. 

Figure 2 is Pearl’s [44]. On the left, it seems that X and Z cause Y: 
manipulating X or Z will change Y. However, if only we had measured the 


a 


b 


X Z U V 



Y X Y Z 


FiO. 2. After Judea Pear! [44, p. 397]. Causation cannot be inferred from association by 
using causal models. In panel (a), X and Z are assumed to be independent. In panel (b), U 
and V are assumed to be independent; it may be shown in consequence that X and Z are 
independent. Also see Duncan [12, pp. 113-1271. 


I 

i 

I 

I 


I 


3 

3 


i 

I 

I 


3 

I 

3 



variables U and V, we might have se 
X, Y, and Z, as in the fight-hand pai 
not change Y at all. No amou 
observables—on Y, Y, and Z can t 
theory. Indeed, matters can be arran 
same joint distribution for the observ 


4. TWO IDEAS OF GONE 

The distinction between the two 
jects with Y = jc, or intervening t( 
concrete example may help, and conr 
with than conditional expectations. 

Many studies have demonstrated ; 
and exposure to two sexually transm 
Suppose we had data as shown in 3 
cancer is 200 per 100,000 for women 
left); 116 per 100,000 for women exp 
130 per 100,000 for those exposed i 
for chlamydia being combined. Oth 

With sample data, there is a rol- 
and testing—for instance, to see i 
across columns. However, the real 
tion. Does herpes cause cervical cai 
nology might find a way to climinat 
inichomatLs. That would be a great r 
rate of cervical cancer? 

To consider the issue of causality 
know the rates for the populath 
Statistical testing must now fade i 


Kale ol Ccmcnl Cancer Cases per 100,0<K> 


Chi 


Yes 


1iorpus 


Yvs 

21111 


ISO 

Itital 

1<)0 


\oU'. DiUn are hyputlictiaiL 


PM3006509540 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 




FREEDMAN 


REGRESSION 


67 


X =-x for some subjects; look at the 

nt. The first involves the data set as you 
irvention. 

u move from selection to intervention; 
ular. However, regression approximates 
in intervention. Nor does the statistical 
ires give the same results; how could it? 
le by assuming that selection tells you 
intervene. 

jften taken to mean that selecting on X 
ntervening to set the value of X—the 
Exogeneity also has weaker meanings, 
y is unfortunate, because analysts may 
and proceed as if they had established 
city in the strong sense defined above 
suits of interventions from nonexperi- 

jn and intervention is acknowledged 
396]): 

sis is indeed sensitive only to covariations, 
usal dependencies from spurious correla- 

)otnote 25—are fatal to the enterprise. 
For instance, he goes on to say that 
unity to observe yet more correlations, 

left, it seems that X and Z cause Y; 
However, if only we had measured the 

b 


U V 

\ 

X Y Z 

usation cannot be inferred from association by 
ire assumed to be independent, tn panel (b), (J 
ty be shown in consequence that X and Z arc 
271. 




variables U and V, we might have seen that they were the joint causes of 
X, Y, and Z, as in the right-hand panel. If so, manipulating X and Z will 
not change Y at all. No amount of statistical analy.sis on the 
observables—on X, Y, and 2—can tell us which panel expresses the right 
theory. Indeed, matters can be arranged so that both theories lead to the 
same joint distribution for the observables. 


4. TWO IDEAS OF CONDITIONAL PROBABILITIES 

The distinction between the two ideas of conditioning—selecting sub¬ 
jects with X = X, or intervening to set X = x —seems fundamental. A 
concrete example may help, and conditional probabilities are easier to deal 
with than conditional expectations. 

Many studies have demonstrated an association between cervical cancer 
and exposure to two sexually transmitted diseases—herpes and chlamydia. 
Suppose we had data as shown in Table I. The incidence rate of cervical 
cancer is 200 per 100,000 for women exposed to herpes and chlamydia (top 
left); 116 per 100,000 for women exposed to herpes but not chlamydia; and 
130 per 100,000 for those exposed to herpes, the two exposure categories 
for chlamydia being combined. Other cells may be read in a similar way. 

With sample data, there is a role for technical statistics in estimation 
and testing—for instance, to see if the rates within a row are constant 
across columns. However, the real question is not association but causa¬ 
tion. Does herpes cause cervical cancer? What about chlamydia? Biotech¬ 
nology might find a way to eliminate Herpes simplex as well as Chlamydia 
trachomatis. That would be a great relief, but would it reduce the incidence 
rate of cervical cancer? 

To consider the issue of causality more directly, suppose that we actually 
know the rates for the population of interest, as shown in Table 1. 
Statistical testing must now fade into the background. The overall inci- 


TABLE t 

Rate of Cervical Cancer Cases per 100,000 Women, by Exposure to Chlamydia and Herpes 



Chlamydia 

Yes 

No 

Total 

I-[crpes 

Yes 

200 

116 

1.10 

No 

ISO 

80 

87 

Total 

190 

90 

too 


No!v, Data are hypothetical. 


Pl\/I3006509541 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 















68 


DAVID FREEDMAN 


REGR! 


dence rate is 100 cervical cancers per 100,000 women (Table 1, bottom 
right). Among women exposed neither to herpes nor to chlamydia, the rate 
is lower—80 per 100,000. If cervical cancer is caused by herpes and 
chlamydia, eliminating the microorganisms responsible for those diseases 
should reduce the incidence rate of cervical cancer from lOO to 80 per 
100,000. On the other hand, if the relationship is not causal, eliminating 
those microorganisms will have little effect on the incidence rate of the 
cancer. 

To be more explicit, 80/100,000 has been found by selecting women 
who are exposed to neither herpes nor chlamydia and by computing the 
incidence rate of cervical cancer for that group, one interpretation of 
conditional probability. If we intervene and eliminate the two diseases, we 
want to know the rate after the intervention; that is another interpretation. 
The two interpretations are different, because the underlying procedures 
are different. Statistical analysis of the numbers in the table, however 
refined or complex, cannot prove that a hypothetical intervention will give 
the same results as selection. This may seem obvious, even banal; but if 
you grant the point, the causal modeling game is largely over. 

What is the situation for Table I? The stoty is far from certain. Current 
epidemiological opinion favors the idea that cervical cancer is caused by 
certain strains of human papilloma virus (HPV); herpes and chlamydia 
have no etiologic role, but serve only as markers for exposure to HPV. If 
that opinion is correct, wiping out herpes and chlamydia will have no 
impact On rates of cervical cancer. 

Due in part to the rarity of cervical cancer, cohort studies do not seem 
to be available. (The numbers in Table I, although hypothetical, are not 
unreasonable.) My point is even stronger for the real studies of the 
association between cervical cancer and herpes or chlamydia. Problems 
created by incomplete data cannot simpliiy the task of inferring causation 
from association.^ 


5. ANOTHER REGRESSION EXAMPLE 



Before we take up the model, let the 
do: 

The interplay between education and f. 
roles women occupy, when in their lift 
length of time spent in these roles.. 
linkages between education and fertii 
relationship between education and agt 
from education to age at first birth 
direction. [Abstract] 

No factor has a greater impact on ! 
Whether a woman becomes a mother 
timing and number of subsequent birt 
roles are assumed.... Education is a: 
roles. Ip, 431, footnote omitted] 

The overall relationship between edi 
unspecified point in adolescence, or p^ 
lions for educational attainment as a ^ 
implications for educational attainmer 
as u measure of status and ability in a 
select occupational goals that require 
Conversely, particular occupational o 
education that must be achieved. The 
educational or occupational goals. At 
tions are affected by a number of pi 
father’s education, family income, inte 
cnee. race, and number of siblings, [p. 

The model used by Rindfuss et a. 
corresponds to two linear cquatio 
(varitiblcs are defined in Table II): 

ED = a 
AGE = d 


According to the model, a women cl 
first [lirth as if by solving these two 
The coefficients a and d are “si 
tile data. The terms A and A take 


Rindfuss et al. [55] propose a model to explain the process by which a 
woman decides how much education to get, and when to have her first 
child. The model illustrates many features of contemporary technique.^ 



* For a discussion of the epidemiology, see Cairns 14], peto and zur Hausen [51], Sherman 
et al [58], Hakama et al [25], Munoz et at. [75], 

’ I use this example because it is discussed by SGS [62, pp. 139-140]. 




A = + h X DADSOCC + 

+ random error drawn f 

.-1' = A\, + b' X FEC + c'l X 1 
+ another random error 

Again, the parameters A^^, h, c^,.. 
from the data. The random errors 


PM3006509542 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 



■REEDMAN 


REGRESSION 


69 


per 100,000 women (Table I, bottom 
er to herpes nor to chlamydia, the rate 
ical cancer is caused by herpes and 
'anisms responsible for those diseases 
)f cervical cancer from 100 to 80 per 
relationship is not causal, eliminating 
le effect on the incidence rate of the 

has been found by selecting women 
nor chlamydia and by computing the 
or that group, one interpretation of 
■ne and eliminate the two diseases, we 
vention; that is another interpretation. 
It, because the underlying procedures 
r the numbers in the table, however 
it a hypothetical intervention will give 
may seem obvious, even banal; but if 
cling game is largely over. 

The story is far from certain. Current 
idea that cervical cancer is caused by 
I virus (HPV); herpes and chlamydia 
y as markers for exposure to HPV. If 
herpes and chlamydia will have no 

al cancer, cohort studies do not seem 
■ible I, although hypothetical, are not 
stronger for the real studies of the 
and herpes or chlamydia. Problems 
implify the task of inferring causation 


SESSION EXAMPLE 

Jel to explain the process by which a 
a to get, and when to have her first 
eatures of contemporary technique.'' 


Caims [4J, Peto and zur Hausen [51], Sherman 
' 1 . 

by SGS [62, pp. 139-140). 



Before we take up the model, let the authors say what they were trying to 
do: 

The interplay between education and fertility has a significant influence on the 
roles women occupy, when in their life cycle they occupy these roles, and the 
length of time spent in these roles... . This paper explores the theoretical 
linkages between education and fertility.... It is found that the reciprocal 
relationship between education and age at first birth is dominated by the effect 
from education to age at first birth with only a trivial effect in the other 
direction. [Abstract] 

No factor has a greater impact on the roles women occupy than maternity. 
Whether a woman becomes a mother, the age at which she does so, and the 
timing and number of subsequent births set the conditions under which other 
roles are assumed.... Education is another prime factor conditioning female 
roles, [p. 431, footnote omitted] 

The overall relationship between education and fertility has its roots at some 
unspecified point in adolescence, or perhaps even earlier. At this point aspira¬ 
tions for educational attainment as a goal in itself and for adult roles that have 
implications for educational attainment first emerge. The desire for education 
as a measure of status and ability in academic work may encourage women to 
select occupational goals that require a high level of educational attainment. 
Conversely, particular occupational or role aspirations may set standards of 
education that must be achieved. The obverse is true for those with either low 
educational or occupational goals. Also, occupational and educational aspira¬ 
tions are affected by a number of prior factors, such as mother's education, 
father’s education, family income, intellectual ability, prior educational experi¬ 
ence, race, and number of siblings, [p. 432, citations omitted] 

The model used by Rindfuss et al. [55] is shown in Fig. 3. The diagram 
corresponds to two linear equations in two unknowns, ED and AGE 
(variables are defined in Table 11): 

ED X AGE-b^, (3) 

AGE = a' X ED -b A. (4) 

According to the model, a women chooses her educational level and age at 
first birth as if by solving these two equations for the two unknowns. 

The coefficients a and a’ are “social constants,” to be estimated from 
the data. The terms A and A' take background factors into account: 

A ~ Aq "b ^ X DADSOGC L X RACE + **• -bC7 X YCIG 

-b random error drawn from a box, ^ ' 

A = A'o + b’ X EEC + c'l X RACE + ••• Tc', X YCIG 

+ another random error drawn from a box. ^ ^ 

Again, the parameters Ag, b, c^,... are social constants to be estimated 
from the data. The random errors are assumed to have mean 0, to be 


PM3006509543 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 



I '3 


DAVID FREEDMAN 


DADSOCC 


NOSIB 


REGN 


ADOLF 



I 11 
I a 


FlO- 3. The model in diagram form [55; 62, p. 140], Variables are defined in Table II. 
Explanatory variables (DADSOCQ RACE, etc.) are correlated; error terms are not shown in 
the diagram, 

TABLE II 

Variables in the Model [55] 

The endogenous variables 
ED Respondent's education 

(Years of schooling completed at first marriage) 

AGE Respondent's age at first birth 

The exogenous variables 

DADSOCC Respondent’s father’s occupation 
RACE Race of respondent (Black = 1, other = 0) 

NOSIB Respondent’s number of siblings 

FARM Farm background (coded 1 if respondent grew up 

on a farm, else 0) 

REGN Region where respondent grew up (South = 1, other = 0) 
ADOLF Broken family (coded 0 if both parents present at 
age 14, else i) 

REL Religion (Catholic = 1, other = 0) 

YCIG Smoking (coded 1 if respondent smoked before age 16, 

else coded 0) 

FEC Fecundability (coded 1 if respondent had a 

miscarriage before first birth; else coded 0) 

Note. The data are from a probability sample of 1766 women 35-44 
years of age residing in the continental United States; the sample was 
restricted to ever-married women with at least one child. DADSOCC 
was measured on Duncan’s scale, combining information on education 
and income; missing values were imputed at the overall mean. SGS [62, 
p. 139] gives the wrong definitions for NOSIB iind ADOLF. 


I a 


ul 


f [1 

r :J 
f :t 
t I 


REORES: 

Statistically independent from woman 
distributed. Correlations across Eqs. (5 
Equations (3)-(6) arc not quite regr 
taneity of (3) and (4); fitting by OLS (c 
“simultaneity bias.” Thus, Rindfuss ef i 
called "two-stage least squares.”FE 
DADSOCC into Eq. C6). Graphically, 
to AGE in Fig. 3; likewise, there is : 
behavioral assumptions are critical to 
them, or some similar assumptions, tv 
used. Technically, the system would nt 
The main empirical finding: The » 
first equation is not "statistically signi 
could be zero. The sort of woman who 
would drop out anyway. 

If looked at coldly, the argument m 
be given along the following lines: 

(i) Statistical assumptions. Just v 
identically distributed across the won 
able, but heterogeneity is more plausil 

(ii) The assumption of constant 
assuming that the same parameters a 
blacks in the cities of the Northeast i 
West. Why? 

(iii) Omitted cariahles. Surely, im 
from the model, including two that 
[55]—aspirations and ability. Malthus 
tant factor. Social class matters, and I 
aspects." 

(iv) What about the '‘no arrow” a 
and FEC to ED? 

(v) Arc FEC and DADSOCC ex 

(vi) Are the equations ".stnictura 

Questions (iv)-(vi) will be discussed 
".structural” oquation.s. 


' See, Maddala discussion, SCO 

Tl^e solution to ihc ‘'omitted variable” pi 
variahlos initi the nindcl. The diiticulties are 
l-rcodnum [17J. 


PM3006509544 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 












TlEEDMAN 



62, p. 140], Variables are defined in Table II. 
tc.) are correlated; error terms are not shown in 


3LE 11 

the Model 1551 


nous rariables 
ucation 

ing completed at first marriage) 

.e at first birth 

ious variables 
her's occupation 
cnt (Black = 1, other 0) 
mbcr of siblings 

(1 (coded 1 if respondent grew up 
) 

•spondent grew up (South = 1, other = 0) 
oded 0 if both parents present at 

ic — 1, other = 0) 

i if respondent smoked before age 16, 

>dcd 1 if respondent had a 
re first birth; else coded 0) 


ability sample of 1766 women 35-44 
lental United States; the sample was 
with at least one child. DADSOCC 
combining information on education 
nputed at the overall mean. SGS [62, 
for NOSIB and ADOLF. 



I 

I 

I 

I 

I 

I 


I 

I 

I 

I 

1 



REGRESSION 71 

statistically independent from woman to woman, and to be identically 
distributed. Correlations across Eqs. (5) and (6) are permitted. 

Equations (3)-(6) are not quite regression equations, due to the simul¬ 
taneity of (3) and (4); fitting by OLS (ordinary least squares) would create 
“simultaneity bias." Thus, Rindfuss et al. [55] use an estimation procedure 
called “two-stage least squares.”*® FEC does not enter into Eq. (5), nor 
DADSOCC into Eq. (6). Graphically, there is no arrow from DADSOCC 
to AGE in Fig. 3; likewise, there is no arrow from FEC to ED. These 
behavioral assumptions are critical to the statistical enterprise. Without 
them, or some similar assumptions, two-stage least squares could not be 
used. Technically, the system would not be “identifiable” (Section 11.4). 

The main empirical finding: The estimated coefficient of AGE in the 
first equation is not “statistically significant”; i.e., the coefficient a in (3) 
could be zero. The sort of woman who drops out of school to have a child 
would drop out anyway. 

If looked at coldly, the argument may seem implausible. A critique can 
be given along the following lines; 

(i) Statistical assumptions. Just why are the errors independent and 
identically distributed across the women? Independence may be reason¬ 
able, but heterogeneity is more plausible than homogeneity. 

(ii) The assumption of constant coefficients. Rindfuss et al. [55] are 
■a.ssuining that the same parameters apply to all women alike, from poor 
blacks in the cities of the Northeast to rich whites in the suburbs of the 
West. Why? 

(hi) Omitted variables. Surely, important variables have been omitted 
from the model, including two that were identified by Rindfuss et al. 
[55]—aspirations and ability. Malthus thought that wealth was an impor¬ 
tant factor. Social class matters, and DADSOCC measures only one of its 
aspects.’* 

(iv) What about the “no arrow” assumptions, from DADSOCC to AGE 
and FEC to ED? 

(v) Are FEC and DADSOCC exogenous? 

(vi) Are the equations “structural”? 

Questions (iv)-(vi) will be discussed in the next section, as will the idea of 
“structural” equations. 


See, e.g., Maddala [39]; for discussion, see Daggett and Freedman [9], 

” The solution to the “omitted variable” problem may seem easy—just throw some more 
variables into the model. The difficulties are” explored in Clogg and Haritou [6], Also see 
Freedman [17], 


PM3006509545 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 













72 DAVID FREEDMAN 

5.1. A Thought Experiment 

A simpler version of the model restricts attention to a more homoge¬ 
nous group of women, where the only relevant background factors are 
DADSOCC and FEC. To make causal inferences from the data using the 
model, we need to believe that the arrows are as shown in Fig. 4, that 
DADSOCC and FEC are exogenous, and that the equations are “struct¬ 
ural.” The following thought experiment may help to define the last term, 
and the empirical commitments behind the words. 

The gedanken experiment involves two groups of women. In both 
groups, fathers are randomized to jobs, and some of the daughters are 
chosen at random to have a miscarriage before their first child. (The 
statistical terminology of randomization is dry; the gedanken experimental¬ 
ist intervenes, for instance, to make the fathers do one job rather than 
another; professors are caused to work as plumbers, and taxi drivers are 
installed as hospital anesthetists.) 

Group /. Daughters are randomized to the various levels of ED, and 
AGE is observed as the response. (The gedanken experimentalist strikes 
again, forcing some women to stay in school longer than they wish, while 
preventing others from continuing their education.) 

Group II. Daughters are randomized to the various levels of AGE, 
and ED is observed as the response. (More gedanken intervention is 
needed.) 

The statistical model can now be translated. For the women in Group I, 
AGE should not depend on DADSOCC—the “no arrow” assumption; 
however, AGE should depend linearly on ED. For the women in Group II, 
ED should not depend on FEC—the other “no arrow” assumption; 


DADSOCC 


FEC 


ED 


-S- AGE 


FlO. 4. A simpler version of the model. 


Also see Pearl [46,471. 


I 

t 

I 

I 

f 

I 

{ 

r 

t 

i 

i 

i 

t 

I 




REGRE 

however, ED should depend linearly 
discovery is that ED would not depei 

There is one final assumption: t 
describe the responses of the women 
the natural situation. That is what 
women who freely chooses her edu 
children does so by using the same 
birth at a certain age. In short, with ’ 
Dcs Moines proceeds more or less 
Gulag. 

The thought experiment provides 
model, by articulating the backgroi 
have not been subjected—cannot bo 
nor can assumptions be validated by •• 
arc almost unthinkable. Do the mi 
reserve? If the assumptions remain i 
of their implications? 


5.2. Exogeneity 

Identifying the exogenous variabf 
you can obtain results quite different 
using variables other than DADSOC 
fuss et td. [56, pp. 981-982] respond 


Instrumental variables... require stro 
«ive quite different results when alto 
usually difficult to argue that behavior, 
they affect only one of the endogenou 


In short, results can depend quite s 
and there is no good way to justifv 
aiuniicr. Also sec Bartels [1], who C’ 
a.sSUmptions and the difficulty of vci 


’ St:v ilollcrth :hk 1 Mocirc [27. 42], An '"in 
of the tsNo-stage leaivt squares estimatuin pro, 
tiiiucal disiinciioa: an “iastriunent” is exoi*'. 
vanuble in the cqiuuion being estimated. Tor 
.isbuined lo he ifulcpeiiderkt of error terms; tl 
11). hven the nidcpendcnee assumption is lu 


PM3006509546 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 








FREEDMAN 


REGRESSION 


73 


restricts attention to a more homoge- 
only relevant background factors are 
usal inferences from the data using the 
le arrows are as shown in Fig. 4, that 
us, and that the equations are “stract- 
iment may help to define the last term, 
hind the words.*^ 

’Ives two groups of women. In both 
> jobs, and some of the daughters are 
carriage before their first child. (The 
ition is dry; the gedanken experimentai- 
ke the fathers do one job rather than 
work as plumbers, and taxi drivers are 


tmized to the various levels of ED, and 
(The gedanken experimentalist strikes 
in school longer than they wish, while 
their education.) 

domized to the various levels of AGE, 
onse. (More gedanken intervention is 


■ translated. For the women in Group I, 
DSOCC—the “no arrow” assumption; 
irly on ED. For the women in Group II, 
'—the other “no arrow” assumption; 


> ED 
I + 


icirit 


I 

——AGE 

.'r version of the model. 



however, ED should depend linearly on DADSOCC. Rindfuss et aids, [55] 
discovery is that ED would not depend on AGE. 

There is one final assumption: the equations and parameters that 
describe the responses of the women in the experiment must also describe 
the natural situation. That is what “structural” means. For instance, a 
women who freely chooses her educational level and her time to bear 
children does so by using the same equations as a woman made to give 
birth at a certain age. In short, with respect to the matters at issue, life in 
Des Moines proceeds more or less along the same lines as life in the 
Gulag. 

The thought experiment provides the intellectual foundation for the 
model, by articulating the background assumptions. These assumptions 
have not been subjected—cannot be subjected—to direct empirical proof, 
nor can assumptions be validated by appealing to thought experiments that 
are almost unthinkable. Do the modelers have some other method in 
reserve? If the assumptions remain unvalidated, what is the logical status 
of their implications? 

5.2. Exogeneity 

Identifying the exogenous variables is a major problem. For example, 
you can obtain results quite different from those of Rindfuss et al, [56] by 
using variables other than DADSOCC and FEC as “instruments.”'^ Rind¬ 
fuss et al. [56, pp. 981-982] respond that estimates made by 


instrumental variables ... require strong theoretical assumptions ... and can 
give quite different results when alternative assumptions are made... it is 
usually difficult to argue that behavrorai variables are truly exogenous and that 
they affect only one of the endogenous variables but not the other. 

In short, results can depend quite strongly on assumptions of exogeneity, 
and there is no good way to justify one set of assumptions rather than 
another. Also see Bartels [1], who comments on the impact of exogeneity 
assumptions and the difficulty of verification. 


'' See Hofferth and Moore [27,42]. An “instrument” is an exogenous variable, used as part 
of the two-stage least squares estimation procedure. Some investigator.? may draw a termino¬ 
logical distinction: an “instrument” is exogenous, but does not appear as an explanatory 
variable in the equation being estimated. For purposes of estimation, exogenous variables are 
assumed to be independent of error terms; this does not suffice for causal inference (Section 
11). Even the independence assumption is not to be made lightly: see Clogg and Haritou [6]. 


PM3006509547 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 


74 


DAVID FREEDMAN 


REGR 


6. AUTOMATED SEARCHES FOR CAUSALITY 

SGS [62] have computerized algorithms that search for path models. 
Using the algorithms, SGS clairq to make rigorous inferences of causation 
from association. For present purposes, a “path model” is a recursive 
system of regression equations, in which the dependent variables from 
some equations are used as explanatory variables in later equations.^"* 

The basic idea in path models is this: putative causes combine with 
parameters and random errors by multiplication and addition in order to 
produce their effects. I have discussed such models elsewhere and do not 
believe they offer much help in deducing causation from association, 
because there is little evidence to support the basic assumptions 
(Freedman [18]). To pursue the discussion here, a slightly more explicit 
definition of the models may be In order. 

Definition. A “path model” starts with variables at “level 0,” which 
are exogenous in the minimal sense that they are not explained within the 
model. Variables “at level 1” are built up as linear combinations of level 0 
variables, plus independent random errors. More generally, variables “at 
level A:” are built up as linear combinations of variables at previous levels; 
again, there are additive, independent random errors. Variables at level 1, 
level 2,... are “endogenous,” in the sense that they are explained within 
the system. The path model may be presented as a “path diagram,” like 
Fig. 1, or Fig. 5 below. Nodes represent variables in the model; if there are 
arrows from X,Y,... to Z, then are explanatory variables in the 

regression equation for Z. Nodes are often called “vertices,” and the 
diagrams are referred to as “graphs” or “causal graphs.”^^ 

The path model may represent mere association—conditional depen¬ 
dence and independence relations. Or the model may represent causation. 
I will take that up later. For now, however, either interpretation suffices. 


The model used by Riudfuss et al. [55] would not fall into this category, if ED and AGE 
really influenced each other. The SGS [62] framework excludes reciprocal causation, by 
assumption; so do path models, as I define them. However, some authors extend the 
definition of path models to include simultaneous equation models for reciprocal causation. 

SGS [62] seem to make the strong—and quite unusual—assumption that exogenous 
variables are independent of each other. That may be part of the reason why their algorithms 
estimate such peculiar models in Figs. 5 and 6 below. There is another, even more esoteric, 
point. To estimate an equation, its error term need only be assumed independent of the 
explanatory variables, if so, error terms from different equations may be correlated; then 
standard procedures for computing the correlations among the variables will not apply; see 
Freedman [IS, pp. 112-li4]; Seneta [57, p. 199], SGS seem to interpret correlated errors as 
indicating the presence of ‘Matent variables.” Such variables will be mentioned in notes to 
Figs. 5 and 6, below. 


I 

I 

I 

f 

I 

f 

I 

I 

I 

I 

f 

I 

I 

I 


J 

1 

] 

J 




a 



Ftc. 5. The left-hand panel show.s the m< 
alst) shows connections among the regres 
'[■FTR.\D BUILD indicates that latent variai 
equations. SL'ILD ask.s whether it should a.s 
tkin [()2, p. cs] (fi,. program output is uninli 
assumption; 1 belieso that is what SGS [62] 
,'l al. (h.U rp. J.’-Ls]. I told BUILD that 
sariabies. :'.'..[jwing SGS [h2. p. 1391. Howevt 
tion that tt* t HC. ED. and AGE ctiuid not e: 
could not Cause the remaining variable.!. VVii 
seems to esc the PC uiat>riihm; without the 
Mueli of th.s iniormtitioti comes fr<im Riehai 
from RindLss ec al [.s.s). not SGS [62]; with tl 
and VCIG causes .ADGLt'. 


Suppose the graph is "'sparse”—c 
reUtlivciy icv variables. .Suppose, it^ 
identities among the regression c 
distribuiiun is "faithiul" ti> if.s gra) 
You have a sample—many in» 

.V. Z.\'mi are willing to assn 

model, nut dt) not know which n 
vtiritibles are tit level 0. which tire ;i 
SGS [62] claim their tilgorithnis 
motlel. or a rather simikir ttiodci. a 


PM3006509548 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 





gorithms that search for path models, 
j make rigorous inferences of causation 
rposes, a “path model” is a recursive 
1 which the dependent variables from 
atory variables in later equations.” 
is this: putative causes combine with 
multiplication and addition in order to 
ssed such models elsewhere and do not 
deducing causation from association, 
: to support the basic assumptions 
iscussion here, a slightly more explicit 
order. 



tarts with variables at “level 0,” which 
e that they are not explained within the 
uilt up as linear combinations of level 0 
n errors. More generally, variables “at 
binations of variables at previous levels; 
ent random errors. Variables at level 1, 
te sense that they are explained within 
re presented as a “path diagram,” like 
sent variables in the model; if there are 
', Y,... are explanatory variables in the 
i are often called “vertices,” and the 
s” or “causal graphs. 

mere association—conditional depen- 
Or the model may represent causation, 
however, either interpretation suffices. 

would not fall into this category, if ED and AGE 
2] framework excludes reciprocal causation, by 
ine them. However, some authors extend the 
ineous equation models for reciprocal causation, 
ind quite unusual—-assumption that exogenous 
tt may be part of the reason why their algorithms 
J 6 below. There is another, even more esoteric, 
erm need only be assumed independent of the 
)m different equations may be correlated; then 
relations among the variabies will not apply: see 
199]. SGS seem to interpret correlated errors as 
Such variables will be mentioned in notes to 




Fig. 5. The left-hand panel shows the model reported by SGS [62]. The right-hand panel 
also shows connections among the regressors, as determined by the search program 
TETRAD. BUILD indicates that latent variables are present, i.e,, errors are correlated across 
equations. BUILD asks whether it should assume “causal sufficiency^^ without this assump¬ 
tion [62, p. 45], the program output is uninformative. Therefore, I told BUILD to make the 
assumption; I believe that is what SGS [62] did for the Rindfuss example. Also sec Spirtes 
el al [63j pp. 13-15]. I told BUILD that ED and AGE could not cause the remaining 
variables, following SGS [62, p. 139]. However^ SGS [62] actually made the stronger assump¬ 
tion that (i) FEC, ED, and AGE could not cause YCIG, and (ii) FEC, ED, AGE, and YCIC 
could not cause the remaining variables. With the assumption of causal sufficiency, BUILD 
seems to use the PC algorithm; without the assumption, the FCI algorithm comes into play. 
Much of this information comes from Richard Schemes (personal communication). Data are 
from Rindfuss et al [55], not SGS [62]; with the SGS covariance matrix, FARM causes REGN 
and YCIG causes ADOLF. 


Suppose the graph is “sparse”—each equation la the model involves 
relatively few variables. Suppose, too, there are no troublesome algebraic 
identities among the regression coefficients; in SGS terminology, the 
distribution is “faithful” to its graph [62, p. 35]; see Section 11.2 below. 
You have a sample—many independent realizations of variables 
X,Y, Z,... . You are willing to assume the distribution conforms to a path 
model, but do not know which model. You do not even know which 
variables are at level 0, which are at level 1, and so forth. 

SGS [62] claim their algorithms are likely to find the underlying path 
model, or a rather similar model, and in short order. Their most convinc- 


PM3006509549 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 


EST 



ing evidence is based on simulation experiments, where the computer 
generates data from a path model and the SGS algorithms try to infer the 
model from the data [62, pp. 145ff, 152ff, 250ff, 320ff, 332ff]; in these 
experiments, the algorithms do very well. Roughly speaking, the SGS 
algorithms are variants of “best subsets” regression, the search being over 
graphs rather than subsets. The data come into the SGS algorithms only 
through the covariance structure. The rest of the apparatus—the dia¬ 
grams, the Markov property, faithfulness, etc.—consists of assumptions. 

SGS [62] seem to assert that their algorithms determine causality, as a 
matter of mathematics. Such assertions are not defensible. In the SGS 
formalism, causation is obtained not by mathematical proof but by mathe¬ 
matical assumption. If you assume that the arrows in the underlying path 
diagram represent causes, then the arrows found by the algorithms repre¬ 
sent causes. If you assume that the underlying arrows represent mere 
associations, then the arrows found by the algorithms represent associa¬ 
tions. Causation has to do with empirical reality, not with mathematical 
proois based On axioms. The issue is not one of theorems, but of the 
connection between theorems and reality. 

The SGS algorithms [62], like many earlier statistical procedures {factor 
analysis, LISREL, etc.), proceed by analyzing the correlation matrix of a 
set of variables. 1 will call such methods “correlational.” Sections 7—10 
consider applications of the SGS algorithms to real examples. Sections 
11-12 try to explain the key ideas in the SGS formalism and indicate by 
mathematical example some of the intrinsic limitations. Before proceed¬ 
ing, however, I discuss the SGS statement of assumptions. 

6.1. The SGS Statement of Assumptions 

SGS [62] discuss the role of assumptions in their theory several times 
(pp. 53-69, pp. 75-81, pp. 324-325, p. 351). However, the clearest state¬ 
ment can be found when SGS are trying to discredit the evidence that 
smoking causes lung cancer: 

effects * « * ^ cannot be predicted from * * * * sample conditional 
probabilities, [p. 302] 


Readers may consult the original for context, to see whether the omitted 
material affects the meaning. The advantage of the quote is clarity. If the 
statement is generally applicable, then SGS—like Yule and Pearl before 
them—have disavowed the ability to infer causation from association. 


I 

f 

I 

I 

I 


I 



D 

a 

3 



1 







I 


I 


7. THE SGS I 

SGS [62] share my pessimistic vit 
however, that their algorithms will sue 


In the oi very strong prior c. 

should nor be Lised to select the van 
criterion variable in data from uncontro 
popular automatic regression search proc 
not be used at all in contexts where 
contexts require improved versions of a 
select those variables whose influence Oi 
by regression. In applications, the powo 
reasonable alternative explanations of tl 
tion,.. . [p. 257] 

At first reading, SGS seems to be fi' 
successful application of their algoritl 
examples are based on simulation, 
examples are mostly to be found in S 
The main examples given in SGS [6 
withstand scrutiny—see Section 5 a 
exception is the stratification model < 
142-145] seem to be quite critical ol 
almost diametrically opposite to the < 
Like SGS, I do not believe that i 
satisfactory causal model. On the otl 
the equations can tell us somethin 
Freedman [18, pp. 122, 220]. The u 
understanding cither of the model oi 
SGS [62] appear to use the heal 
example to illustrate their theory.*® 
diagrams arc ail hypothcticals. no 
substantive conclusions are drawn. 1: 
dc.scriptions of causal mechanisms, tl 
What about the substantive qucsli 
heart di.sea.se, and many other iltnes.si 


Simuliitions tell us hew well the SOS lilg' 
lions hnlil gnud; the assumptions are buiH 
simulated data. When ujtplying statistical algo: 
those assumptions hold. J he simulations do i- 
' Parallel material is in [23, pp. I.l-K), 21 
'' See. e.p., [h2, pp. IS. 216-237], 


PM3006509550 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 










' FREEDMAN 

;ion experiments, where the computer 
and the SGS algorithms try to infer the 
5ff, 152ff, 2S0ff, 320ff, 332ff]; in these 
,fery well. Roughly speaking, the SGS 
bsets” regression, the search being over 
ata come into the SGS algorithms only 
. The rest of the apparatus—the dia- 
ulness, etc.—consists of assumptions, 
eir algorithms determine causality, as a 
ntions are not defensible. In the SGS 
ot by mathematical proof but by mathe- 
‘ that the arrows in the underlying path 
c arrows found by the algorithms repre- 
the underlying arrows represent mere 
id by the algorithms represent associa- 
mpirical reality, not with mathematical 
je is not one of theorems, but of the 
reality. 

any earlier statistical procedures (factor 
>y analyzing the correlation matrix of a 
methods “correlational.” Sections 7-10 
algorithms to real examples. Sections 
i in the SGS formalism and indicate by 
le intrinsic limitations. Before proceed- 
itement of assumptions. 


ipiions 

sumptions in their theory several times 
15, p. 351). However, the clearest state- 
■e trying to discredit the evidence that 


cted from * ♦ * * sample conditional 


for context, to see whether the omitted 
advantage of the quote is clarity. If the 
then SOS—like Yule and Pearl before 
to infer causation from association. 


f 

r 

{ 

i 

\ 

I 

I 

I 

I 

r 

r 

r 

r 

I 

r 

r 



n 

ii 

u 

a 

a 

% 

j 

I 

I 

I 

1 

I 


REGRESSION 77 

7. THE SGS EXAMPLES 

SGS [62] share my pessimistic views about regression. They claim, 
however, that their algorithms will succeed where regression has failed: 

In the absence of very strong prior causal knowledge, multiple regression 
should not be used to select the variables that influence an outconte or 
criterion variable in data from uncontrolled studies. So far as we can tell, the 
popular automatic regression search procedures [like stepwise regression] should 
not be used at ali in contexts where causa! inferences are at stake. Such 
contexts require improved versions of algorithms like those described here to 
select those variables whose influence on an outcome can be reliably estimated 
by regression. In applications, the power of the specification searches against 
reasonable alternative explanations of the data is easy to determine by simula¬ 
tion ... . [p. 257] 

At first reading, SGS seems to be filled with real examples showing the 
successful application of their algorithms. That is an illusion, Many of the 
examples are based on simulation, and I set those aside.The real 
examples are mostly to be found in SGS [62, pp. 132-152, 243-256].^^ 

The main examples given in SGS [62] are path models. But these cannot 
withstand scrutiny—see Section 5 above and Sections 8-9 below. One 
exception is the stratification model of Blau and Duncan [3]. SGS [62, pp. 
142-145] seem to be quite critical of this model; their current position is 
almost diametrically opposite to the one in Glymour ef al. [24, pp. 33-39]. 
Like SGS, I do not believe that the Blau-Duncan regressions are a 
satisfactory causal model. On the other hand, as descriptions of the data, 
the equations can tell us something important about our society: see 
Freedman [18, pp. 122, 220], The discussion in SGS adds little to our 
understanding either of the model or of stratification. 

SGS [62] appear to use the health effects of smoking as a running 
example to illustrate their theory.'® Again, there is an illusion. The causal 
diagrams are all hypotheticals, no contact is made with data, and no 
substantive conclusions are drawn. If the diagrams were proposed as real 
descriptions of causal mechanisms, they would not be credible. 

What about the substantive question: does smoking cause lung cancer, 
heart disease, and many other illnesses? SGS [62] appear not to believe the 


Simulations tell us how well the SGS algorithms do if the underlying statistical assump¬ 
tions hold good; the assumptions are built into the computer code that generates the 
simulated data. When applying statistical algorithms to real data, a critical quesh'on is whether 
those assumptions hold. The simulations do not address such questions. 

Parallel material is in [23, pp. 13-16, 21-23], 

'''' See, e.g., [62, pp. IS, 216-237], 


PM3006509551 

Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 





78 


DAVID FREEDMAN 


REGF 


epidemiological evidence. When they actually get down to arguing their 
case, they use a rather old-fashioned method—a literature review with 
arguments in ordinary English [62, pp. 291-302]. Causal models and search 
algorithms have disappeared. „ 

I approve of the method if not the implementation; the summary is 
wrong in some places and tendentious in others. However, the review does 
show the complexity of the issues. To make judgments about causation, 
you need to consider death certificate data, necropsy data, case control 
and cohort studies, twin studies, dose response curves, as well as animal 
experiments and human experiments. The force of the epidemiological 
evidence—and the SGS critique—depends on the complex interplay among 
these various studies and data sets. 

In the end, SGS [62] do not really make bottom-line judgments on the 
health effects of smoking, at least so far as I can see. Their principal 
conclusion is methodological: nobody understood the issues. 


Neither side understood what uncontrolled studies could and could not deter¬ 
mine about causal relations and the effects of interventions. The statisticians 
pretended to an understanding of causality and correlation they did not have; 
the epidemiologists resorted to informal and often irrelevant criteria, appeals to 
plausibility, and in the worst case to homincm .... While the statisticians 
didn’t get the connections between causality and probability right, the, 
‘"epidemiological criteria for causality” were an intellectual disgrace, and the 
level of argument.. .was sometimes more worthy of literary critics than scien¬ 
tists. [62, pp. 301-302]. 


Part of a sentence in SGS [62, p. 4] does seem to grant one of the major 
claims made by the epidemiologists, “smoking does cause lung cancer.” 
But that only complicates the puzzle. If you don't believe the evidence, 
why accept the claim? 

Despite SGS [62], the epidemiologists did have a good understanding of 
the issues and made a strong case against smoking. The arguments were 
imperfect, and some reasonable doubts may remain. But the data, taken 
all in all, are compelling. The epidemiological literature on smoking is far 
stronger than anything I have seen in the social sciences. For a survey of 
the evidence, see Cornfield et al. [7]; this paper is still worth reading. More 
recent data are reviewed in [30]. 

SGS [62] elected not to use their analytical machinery on the smoking 
data—a remarkable omission. When applied to the examples that SGS 
actually chose, the algorithms produce one small disaster after another, as 
will now be seen. In sum, SGS [62] claim to have developed techniques for 
generating causal models; but they do not have any success stories. 



8. USING THE SGS 5 

The SGS search procedures are er 
TETRAD [62]. Version 2.1 of thN pr 
Schemes and Peter Spirtes. The BL 
used to discover path models with nc 
examples—Rindfuss er al. [55] and 

8.1. Rindfuss et al. 

To explain AGE (age at first bin 
the SGS [62] algorithms select the vt 


The SGS [62] model for age at first birj. o?- 
Rindfuss eT.;u If; . 




SGS oovarinrirS* 

Estimate 

R- = 0.2" 

SE 

RACE 

-1.66 

.30 

REGN 

-0.56 

.19 -: 

.ADOLF 

1.89 


Ycro 

2.14 


FEC 

2.72 

.2S 

ED 

0.67 

.04 


Style, (i) Intercepts are not reponec: GL.5 . 

(ii) The first oolumn in Table 3 ihc-*. 
errors, or SEs, which indicate the likeiy erze , 
true parameter values. The (-statistics in tns 
Generally, a /-statrstie above 2 or 3 m sr 
parameter is unlikely to be truly 0. F-rr ce:^- 

(iii) Ike parameters are features i.t die =j.’ 

[f you i.1l 3 not believe in the existence ci the i 
the statistical assumptions in the :,-.e ■ 

In any case, performing mnltiplo tests—it .r. . 
lion ol the (-statistics [l7, 231. 

(ivl li~ is gcnertilly mierprctable as a -nr.c' 
ol the model hold true. An R- of 6.2" insiicii 
been explained: that is not much, and mr.ot-.v 
less cxplanatt)ry power, b'or a discussion .. f - 

Iv) Accortling to current epiJemiolosana, 
ctfeel, delaying conception by several weeits 
dilforenl from the nonsmokers and ha-.e tr.. 
remains even :ilier eomrolling for the —-sas 
eoeffieient of YC'IG is —tl.Sh years. 


PM3006509552 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 














D FREEDMAN 

they actually get down to arguing their 
oned method—a literature review with 
, pp. 291-302]. Causal models and search 

ot the implementation; the summary is 
ious in others. However, the review does 
s. To make judgments about causation, 
ficate data, necropsy data, case control 
dose response curves, as well as animal 
ents. The force of the epidemiological 
depends on the complex interplay among 
s. 

ally make bottom-line judgments on the 
st so far as I can see. Their principal 
ody understood the issues. 


ntroUed studies could and could not deter- 
e effects of interventions. The statisticians 
'ausality and correlation they did not have; 
mal and often irrelevant criteria, appeals to 
to ad homincm .... While the statisticians 
n causality and probability right, the.... 
ity’" were an intellectual disgrace, and the 
' more worthy of literary critics than scien- 


4] does seem to grant one of the major 
sts, “smoking does cause lung cancer.” 
izle. If you don’t believe the evidence, 

ogists did have a good understanding of 
: against smoking. The arguments were 
oubts may remain. But the data, taken 
emiological literature on smoking is far 
1 in the social sciences. For a survey of 
']; this paper is still worth reading. More 

ir analytical machinery on the smoking 
len applied to the examples that SGS 
luce one small disaster after another, as 
claim to have developed techniques for 
■ do not have any success stories. 


I 

I 

I 

I 

I 

II 


I 

1 

I 

I 



I 



REGRESSION 79 

8. USING THE SGS SEARCH PROCEDURE 

The SGS search procedures are embodied in a computer program called 
TETRAD [62]. Version 2.1 of this program was kindly provided by Richard 
Scheines and Peter Spirtes. The BUILD module is the part of TETRAD 
used to discover path models with no latent variables. I ran BUILD on two 
examples—Rindfuss et al. [55] and AFQT (to be discussed in Section 9). 

8.1. Rindfuss et al. 

To explain AGE (age at first birth) in the Rindfuss et al [55] example, 
the SGS [62] algorithms select the variables shown in Table III. Regression 


TABLE III 

The SGS [62] model for age at first birth, computed using the SGS covariance matrix or the 
Rindfuss el al. [55] covariance matrix 



SGS covariance 

Rindfuss et al. covariance 

Estimate 

= 0.27 

SE 

t 

Estimate 

= 0.24 

SE 


RACE 

-1.66 

.30 

-5.50 

-1.66 

.30 

-5.46 

REGN 

-0.56 

.19 

-3.01 

-0.63 

.19 

-3.35 

ADOLF 

1.89 

.22 

8.60 

2.01 

.22 

8.98 

YCIG 

2.14 

.25 

8.63 

-0.S9 

.25 

-3.53 

FEC 

2.72 

.28 

9.70 

2.77 

.28 

9.72 

ED 

0.67 

.04 

18.00 

0.60 

.04 

15.72 


Note, (i) Intercepts are not reported; OLS estimates. 

(ii) Tlie first column in Table 3 shows parameter estimates. Hie second shows standard 
errors, or SEs, which indicate the likely size of the differences between the estimates and the 
true parameter values. The t-statistics in the third column are the ratios of estimates to SEs. 
Generally, a t-statistic above 2 or 3 in absolute value indicates that the corresponding 
parameter is unlikely to be truly 0. For details, see the Appendix. 

(iii) The parameters are features of the model, and the SEs are computed using the model, 
if you do not believe in the existence of the parameters apart from the data, or do not accept 
the statistical assumptions in the model, the SEs and t-statistics are likely to be meaningless. 
In any case, performing multiple tests—as in a search algorithm—complicates the interpreta¬ 
tion of the r-statistics [17, 23], 

(ivl is generally interpretable as a descriptive statistic, whether or not the assumptions 
of the model hold true. An i?“ of 0.27 indicates that about 27% of the variance in AGE has 
been explained; that is not much, and models in the social science literature often have even 
less explanatory power. For a discussion of R“, see [20, pp. 78-81]. 

(v) According to current epidemiological opinion, smoking does have some biological 
effect, delaying conception by several weeks. However, the women who choose to smoke arc 
different from the nonsmokers and have their first child almost a year earlier. This effect 
remains even after controlling for the measured background factors in the regression; the 
coefficient of YCIG is —0.89 years. ' ~ 


/ 


1 


PM3006509553 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 






80 DAVID FREEDMAN 

estimates for the coefficients, based on summary data in SGS, are reported 
in the first three columns of the table. The coefficients for ADOLF (the 
indicator for women from broken homes) and YCIG (an indicator for 
smoking by age 16) have positive signs. That is paradoxical: women from 
broken homes and women who smoke should be having children earlier, 
not later.^* The signs should be negative, not positive. SGS do not 
comment on this issue. 

Rindfuss et aL [55] give standard deviations and correlations for their 
data; SGS [62, p. 139] used these statistics to compute a covariance matrix, 
but reversed some of the signs. The last three columns of Table III report 
regression estimates computed from the correct covariances. The problem 
with YCIG disappears, but the sign for ADOLF stays positive. Anyone can 
make a mistake entering data; ignoring paradoxical signs in a causal model 
is quite another matter. 

SGS [62] report only a graphical version of their model. They say, 

Given the prior information that ED and AGE are not causes of the other 
variables, the PC algorithm (using the .05 significance level for tests) directly 
finds the model [in Figure 5(a)] where connections among the regressors are 
not pictured. [62, p, 139] 

However, connections among regressors can be of interest. Although 
TETRAD is supposed to discover the causal ordering of explanatory 
variables, it produces the very strange model shown in Fig. 5(b). For 
example, the model says that race and religion cause region of residence. 
Comments on the sociology may be out of place, but consider the statistics. 
The equation is 

REGN = a + b X RACE + c X REL + e. (7) 

REGN is a dummy variable, coded 1 for respondents who grew up in the 
South, 0 for others; RACE is 1 for black respondents and 0 for others; 
REL is 1 for Catholics, 0 for others; e is normally distributed. In conse¬ 
quence, this equation forces impossible values on REGN: the left-hand 
side is 0 or 1, the right-hand side varies from — oo to -l-oo. Now is only 
0.16, so £ contributes most of the variance; Eq. (7) can hardly be defended 
as an approximation. Having dummy variables in the middle of path 
diagrams is a blunder. (FARM creates a similar problem; so does NOSIB, 
although less extreme.) In short, the SGS algorithms have produced a 
model that fails the most basic test—internal consistency. 

Smoking, broken homes, and eariy childbearing seem to be correlates of social disadvan¬ 
tage and indicators of personality traits. DADSOCC and RACE are quite imperfect controls 
for family background; therefore, YCIG and ADOLF are likely to pick up the effects of 
background, as well as the effects of omitted personality variables. See note (v) to Table III. 
This sort of bias is discussed in Section 12.2 below. Also see Qogg and Haritou [dj. 


[ 

f 

f 

r 

i 

I 


1 

I 

I 


I 

I 

I 

I 





REGR 

9. THE ARMED FORCE 

SGS [62] discuss an example base 
Test (AFQT).-" The AFQT Is a line 
scores on certain subtests. Some of : 
are not part of the AFQT, are listed 
which subtests go into the AFQT an 
The problem may be stated more 

AFQT score = a[ X NO - 

-H b I X U 

where UN,_,GN„ are unobserval 

challenge is to figure out which one- 
We have data on 6224 subjects. 
According to SGS [62, pp, 243-244]: 

a linear multiple regression of AFC. 
signilTcant regression coefticients lo a. 
tests tluit arc in fact Unear component 
tion that AFQT is not a cause of any o 
TETRAD !1 correctly picks out (AR. > 
be components of AFQT... . 

To test the elaims about regressior 
subtests. As Table V shows, El and 
chance level. Moreover, MK and 
psychometric practice frowns on st 
ovcrtill test scores. It is a natural coi 
AFQT, while the other four subtet 
SGS, the AFQT can be handled by 


TAf 

Subtests Anal 


1, Numcrieti! Opera 

2. Word Knawledge 
Aritlimeticul Rea 

4. Miitheitiatical Kii 
?. Eleetronies Infoii 
<v Mechanical Com 
7. General Science 


.Vdic. Some go intc 
not. 

SGS [fi2. p. 24,t]. Institutional huckgron 


PM3006509554 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 













;d on summaty data in SGS, are reported f ^ 9. THE ARMED FORCES QUALIFICATION TEST 


table. The coefficients for ADOLF (the 

:n homes) and YCIG (an indicator for • ' & SGS [62] discuss an example based on the Armed Forces Qualification 

• signs. That is paradoxictd: women from I ^ Test (AFOT),^® The AFQT is a linear combination with fixed weights of 
.moke should be having children earlier, I scores on certain subtests. Some of these subtests, as well as subtests that 

le negative, not positive. SGS do not _ ' are not part of the AFQT, are listed in Table IV. The problem is to decide 

I • 1 which subtests go into the AFQT and which do not. 

ird deviations and correlations for their * " The problem may be stated more algebraically as 


statistics to compute a covariance matrix, 
he last three columns of Table III report I 
rm the correct covariances. The problem ■ 
,n for ADOLF stays positive. Anyone can 
oring paradoxical signs in a causal model » 

1 

il version of their model. They say, 

ED and AGE are not causes of the other ■ 

the .05 significance level for tests) directly ,1 

here connections among the regressors are. '■ 



regressors can be of interest. Although 
/er the causal ordering of explanatory 
strange model shown in Fig. 5(b). For 
i and religion cause region of residence, 
e out of place, but consider the statistics. 


II 


RACE + c X REL + e. (7) 

-d 1 for respondents who grew up in the 
for black respondents and 0 for others; 
ers; e is normally distributed. In conse- 
ossible values on REGN; the left-hand 
varies from —to +=, Now is only 
variance; Eq. (7) can hardly be defended 
immy variables in the middle of path 
.■ates a similar problem; so does NOSIB, 
. the SGS algorithms have produced a 
St —internal consistency. 

Idbearing seem to be correlates of social dtsadvan- 
■ADSOCC and RACE are quite imperfect controls 
and ADOLF are likely to pick up the effects of 
ed personality variables. See note (v) to Table HI- 
: .2 below. Also see Clogg and Harifou [6]. 



AFQT score = a, X NO + 03 X WK + ••• +a-/ X GS 

+ X UNi + —+b„ X UN„, (8) 

where UN^,.... UN„ are unobservable. Some of the a’s are zero, and the 
challenge is to figure out which ones. 

We have data on 6224 subjects, summarized as a covariance matrix. 
According to SGS [62, pp. 243-244]: 

a linear multiple regression of AFQT on the other seven variables gives 
significant regression coefficients to all seven and thus fails to distinguish the 
tests that are in fact linear components of AFQT,.. . Given the prior informa¬ 
tion that AFQT is not a cause of any of the other variables, the PC algorithm in 
TETRAD II correctly picks out {AR, NO, WK} as the only.. .variables that can 
be components of AFQT.... 

To test the claims about regression, I ran AFQT on all the observable 
subtests. As Table V shows, El and MC are related to AFQT only at the 
chance level. Moreover, MK and GS have negative coefficients, but 
psychometric practice frowns on subtests that are negatively related to 
overall test scores. It is a natural conjecture that NO, WK, and AR go into 
AFQT, while the other four subtests do not. Contrary to the claims of 
SGS, the AFQT can be handled by ordinary statistical methods. 


TABLE IV 

Subtests Analyzed by SGS [62] 


1. Numerical Operations NO 

2. Word Knowledge WK 

3. Arithmetical Reasoning AR 

4. Mathematical Knowledge MK 

5. Electronics Information El 

6. Mechanical Comprehension MC 

7. General Science GS 


l^ote. Some go into the AFQT and some do 
not. 

■"SGS [62, p. 243]. Institutional background on the AFQT will be found in Section 12.5. 




PM3006509555 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 








82 


DAVID FREEDMAN 


REGRI 



TABLE V 

Regression of AFQT on All the Observable Subtests 



Estimate 

SE 

t 

NO 

0.24 

.022 

10.8 

WK 

1.17 

.029 

40.5 

AR 

1.03 

.028 

36.4 

MK 

-0.24 

.028 

-8.7 

El 

-0.03 

.024 

-1.3 

MC 

0.03 

.024 

IS 

QS 

-0.13 

.029 

-4.6 


Mote, Variables were centered at their means. 


The AFQT problem is in some ways quite easy. By definition, the 
“causes” or subtests combine linearly with the parameters to produce the 
AFQT as an “effect.” Joint normality of test scores seems to follow from 
the procedures used to construct the tests: consequently, scores on any one 
subtest can be presented as a linear combination of other subtest scores, 
with additive random errors. Thus, critical issues in most empirical studies 
have disappeared.^^ 

9.1. TETRAD 

According to SGS, given the prior information that AFQT does not 
cause the other variables, TETRAD correctly picks out AR, NO, and WK 
as the components of the AFQT.^^ Without that prior information, how¬ 
ever, TETRAD declares AFQT to be the cause of these subtests, rather 
than the effect. With the prior information, TETRAD produces the strange 
results shown in Figure 6.^ Now, for instance, the subtest NO may 
“cause” the overall test score AFQT, but it can hardly cause the other 
subtests AR or MK. Furthermore, there is a cycle in the figure: 

MC ^ AR ^ WK GS ^ MC. 

In principle, such cycles were excluded by prior assumption, as well they 
might be. Subtests should not cause themselves, even indirectly. To sum 
up: 

(i) ordinary least squares techniques pick out NO, AR, and WK for 
the probable components of the AFQT, just as TETRAD does; 

(ii) TETRAD produces the curious model in Figure 6. 

On the other hiind, unobserved variables may create serious problems (Section 12.4). 

“ SGS [62, p. 243], 

^ The program output is given in Spirtes et al. [63, pp. iO-H]. 




I’Ui. f). AFOT and its siibtcsts arranged in 
I believe SGS [f>2, pp. 243-244] used BUILD 
iJic AE‘OT cxuinpie. Also sec Spirtes et at. [c 
liueiii vuriablcs. i.c., eorrelatioiis in the errors 


10. FOREIGN invest; 

OPPRl 

As noted in Section 7, SGS arc qu 
cncc applictitkins of regression. Whi 
specitic ttbjcctions seem misplaced, 
point. Timhcrlakc and Williams [65 
political exclusion (PO) in terms of 
opnicm (EN). and civil liberties (C 
aiiihoriiarian regimes that exclude 
lion; high values Of CV indicate ft 
connii'ics. L'tlrrclation.s among the 
'■hown in Table VI. 

The equation proposed by Timbci 

I’Q = a -h b X FI + c 

I'.mpirical testilts arc shown in the 
csiiinaictl coclTicicnts of FI is signi 


PM3006509556 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 










D FREEDMAN 


REGRESSION 


83 


TABLE V 

on Ali the Observable Subtests 


SE 

t 

.022 

lO.S 

.029 

40.5 

.028 

36,4 

.028 

-8.7 

.024 

-1.3 

.024 

1.3 

.029 

-4.6 


; centered at their means. 


ne ways quite easy. By definition, the 
•arly with the parameters to produce the 
ality of test scores seems to follow from 
he tests: consequently, scores on any one 
ear combination of other subtest scores, 
critical issues in most empirical studies 


! 

I 

I 

i 

I 

1 ! 


I 



I 

I 

I 


prior information that AFQT does not 
vD correctly picks out AR, NO, and WK 
Without that prior information, how- 
0 be the cause of these subtests, rather 
irmation, TETRAD produces the strange 
)W, for instance, the subtest NO may 
'QT, but it can hardly cause the other 
, there is a cycle in the figure: 

WK GS ^ MC. 

luded by prior assumption, as well they 
tse themselves, even indirectly. To sum 

chniques pick out NO, AR, and WK for 
\.FQT, just as TETRAD does; 

curious model in Figure 6. 

lies may create serious problems (Section 12.4). 

s et ai [63, pp. tO-11]. 


ir. 

r 

r 

j 

r 

:i 

If 

j 

if 

b 

I ■ 

If 

J 

ir 

:i 


NO MC 



Fic. 6. AFQT and its subtests arranged in causal order by the search program TETRAD. 
I believe SGS [62, pp. 243-244] used BUILD, with the assumption of causal sufficiency, for 
the AFQT example. Also see Spirtes et al. [63, pp. 8-11]. The program indicates there are 
latent variables, Le., correlations in the errors. 


10. FOREIGN INVESTMENT AND POLITICAL 
OPPRESSION 

As noted in Section 7, SGS are quite pessimistic about typical social-sci¬ 
ence applications of regression. While I agree with the bottom line, their 
specific objections seem misplaced. One example is enough to make the 
point. Timberlake and Williams [65] offer a regression model to explain 
political exclusion (PO) in terms of foreign investment (FI), energy devel¬ 
opment (EN), and civil liberties (CV). Pligh values of PO correspond to 
authoritarian regimes that exclude most citizens from political participa¬ 
tion; high values of CV indicate few civil liberties. Data come from 72 
countries. Correlations among the Timberlake-Williams variables are 
.shown in Table VI. 

The equation proposed by Timberlake and Williams [65] is 

PO = fl+&XFI+cXEN + dxCV+ error. (9) 

Empirical results are shown in the first three columns of Table VII. The 
estimated coefficients of FI is significantly positive and is interpreted as 


PM3006509557 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 












84 DAWD FREEDMAN 


TABLE VI 

The Timberlake and Willianis Correlation Matrix 



PO 

FI 

EN 

CV 

PO 

t.000 

‘ - .175 

-.480 

.868 

FI 

-.175 

1.000 

.330 

-.391 

EN 

-.480 

.330 

1.000 

-.430 

CV 

.868 

-.391 

-.430 

1,000 


Note. Correlation matrix for political oppression (PO), 
foreign investment (FI), energy development (EN), and civil 
liberties (CV). Source: [62, p. 249]. 


measuring the effect of foreign investment on political exclusion; 
see Timberlake and Williams [65, p. 143]. 

SGS discuss this example [62, pp, 248-250], suggesting that Timberlake 
and Williams have confused cause and effect. The alternative causai 
sequence is not spelled out. Presumably, the idea is that dictators “cause” 
foreign investment in the sense that investors think dictatorial regimes 
offer greater stability, etc. 

The main step in the SGS statistical argument comes down to this; the 
correlation of —0.175 between political exclusion and foreign investment 
is at the chance level. The calculation rides on two assumptions: (i) the 72 
countries in the data set are a random sample from some much larger set 
of countries and (ii) the variables follow a multivariate normal distribution. 
These time-honored but madcap assumptions are not stated explicitly by 


TABLE VII 

The Timberlake and Williams Model 



R'^ = .81 



R- = .93 


Estimate 

SB 

t 

Estimate 

SE 

t 

FI 

.23 

.059 

3.9 

.44 

.036 

12 

EN 

-.18 

.060 

-2.9 

-.22 

.037 

-6 

CV 

.88 

.061 

14.4 

.95 

.038 

25 


Note. Political exclusion (PO) is regressed on foreign investment (FI), energy 
development (EN), and civil liberties (CV). Tlie first three columns show results 
for the observed correlation matrix (Table VI). The last three columns show tvhat 
happens when r(PO, FI) is set to 0. Coefficients in Table VII are standardized, that 
is, computed from variables standardized to have mean 0 and variance 1. The 
coefficients reported by SGS [62, p. 249] are not standardized and therefore do not 
match the correlation matrix. 


I 

! 

I 

I 

I 

I 

I 

I 

t 

i 

I 

I 

I 

r 

t 



i 

I 







r 


I 


REGRt 

SGS, let alone justified. (Of course, r 
in Timberlake and Williams might sC' 
However, for the sake of argument 
tions. On that basis, the standard en 
about I //72 == .12. I change the sut 
observed value of —0.175 to the new 
SEs, I then recompute the model (la 
results are even better for Timbcrlak' 
cients are bigger and more signiftca, 
moves closer to 1.^'’ 

I will not defend the model any 
extreme, and the list of omitted varial 
that cause and effect have been l 
peculiar. The correlation matrix cam 
PO—the fatal flaw in the Timbc 
Timberlake and Williams are not aloi 
show that FI, EN and CV do not c. 
SGS. Indeed, it is trivial to construe 
and PO, such that FI, EN and CV i 
matrices will look rather like the one 
basic question. What do any of these 
outside the computer? 


11. SOME MATHt 

Sections 11 and 12 address by mail 

(0 To what extent can correlal 
path diagram? 

(ii) When can the arrows in the 
causation, rather than conditional int 

The examples will indicate how SG 
tion to help them answer such qui 
consistency will he discu.ssed, and met 
he delineated. Sections 11 and 12 
material; readers can skip to Scctioi 
argument. 

The focus is on linear models. Sii 
tluit describes certain variables. A.ssu 

■' rtu; new m.itrix is still positive definite, ^ 
12.1 diseusses the eunneetinn between the 'I'ii 
:issnmpiioii. Also see Ctinwright [5. pp. 79-S4 


PM3006509558 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 





D FREEDMAN 


REGRESSION 


85 


'ABLE VI 

J Williams Correlation Matrix 


FI 

EN 

CV 

-.175 

-.480 

.868 

1.000 

.330 

-.391 

.330 

1.000 

-.430 

-.391 

-.430 

1.000 


:rLx for political oppression (PO), 
-•nergy development (EN), and civil 
.2, p. 249], 


gn investment on political exclusion: 
. p. 143]. 

pp. 248—250], suggesting that Timberlake 
luse and effect. The alternative causal 
amably, the idea is that dictators “cause” 
that investors think dictatorial regimes 

istical argument comes down to this: the 
lOlitical exclusion and foreign investment 
ition rides on two assumptions: (i) the 72 
ndom sample from some much larger set 
follow a multivariate normal distribution. 
• assumptions are not stated explicitly by 


ABLE VII 

ke and Williams Model 


= .93 

t 

Estimate 

SE 

t 

3.9 

.44 

.036 

12 

-2.9 

-.22 

.037 

-6 

14.4 

.95 

.03S 

25 


regressed on foreign investment (FI), energy 
.'S (CV). The first three columns show results 
(Table VI). The last three columns show what 
■'oefficients in Table VII are standardized, thtit 
Lrdized to have mean 0 and variance 1. The 
249] are not standardized and therefore do not 


I 

i 

I 

I 

i 

t 

f 

r 

t 


1 

i 

I 

I 

I 

I 

I 

r 

I 



ez 


SGS, let alone justified. (Of course, the assumptions behind the statistics 
in Timberlake and Williams might seem equally antic.) 

However, for the sake of argument, let us grant SGS [62] their assump¬ 
tions. On that basis, the standard error for the correlation in question is 
about .12. I change the suspect correlation coefficient from its 

observed value of —0.175 to the new value of 0, a difference of about 1.5 
SEs. I then recompute the model (last three columns in Table VII). The 
results are even better for Timberlake and Williams: the estimated coeffi¬ 
cients are bigger and more significant; the signs stay the same; and 
moves closer to 1.^ 

I will not defend the model any further. Measurement problems are 
extreme, and the list of omitted variables very long. SGS may well be right, 
that cause and effect have been confused. But the demonstration is 
peculiar. The correlation matrix cannot show that FI, EN and CV cause 
PO—the fatal flaw in the Timberlake-Williams model. (Of course, 
Timberlake and Williams are not alone in this respect.) Nor can the matrix 
show that FI, EN and CV do not cause PO—the corresponding flaw in 
SGS. Indeed, it is trivial to construct four variables labelled FI, EN, CV 
and PO, such that FI, EN and CW do cause PO; but sample correlation 
matrices will look rather like the one in Table VI. This only sharpens the 
basic question. What do any of these calculations tell us about the world 
outside the computer? 


11. SOME MATHEMATICAL ISSUES 

Sections 11 and 12 address by mathematical example two questions: 

(i) To what extent can correlational methods recover an underlying 
path diagram? 

(ii) When can the arrows in the diagram be interpreted as indicating 
causation, rather than conditional independence and dependence? 

The examples will indicate how SGS [62] use the “faithfulness” assump¬ 
tion to help them answer such questions. Issues of identifiabilily and 
consistency will be discussed, and methodological contributions in SGS wilt 
be delineated. Sections 11 and 12 are more technical than previous 
material; readers can skip to Section 13 without losing the thread of the 
argument. 

The focus is on linear models. Suppose you have a covariance matrix 
that describes certain variables. Assume these variables are jointly normal, 

The new matrix is still positive definite, so it is a legitimate correlation matrix. Section 
12.1 discusses the connection beween the Timberlake-Williams model tind the faithfulness 
assumption. Also see Cartwright [5, pp. 79-84]. 


PM3006509559 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 






86 


DAVID FREEDMAN 


REGRET 


with mean 0; that avoids all questions of linearity etc. and all problems 
created by having only finite amounts of data. However, the statistical 
procedures 1 am considering—like the SGS algorithms—will operate on 
that covariance matrix and on nothing else. Such procedures may be called 
“correlational.” 

Path models were defined in Section 6. Briefly, you start with variables 
at level 0; variables at level k are linear combinations of variables at lower 
levels, plus independent random errors. In a path diagram, nodes repre¬ 
sent variables. There is an arrow from X to T if Z is used as an 
explanatory variable in the equation for Y. 

Exogeneity is a critical concept. As indicated before, the term is used in 
at least three senses. The weakest definition is purely mechanical: exoge¬ 
nous variables arc not explained within the model, but are supplied to the 
model. Variables at level 0 in a path model are exogenous in this minimal 
sense. A more restrictive definition: exogenous variables are statistically 
independent of the error terms in the equations. The third idea is the one 
that is relevant to causal inference: X is exogenous if selecting subjects 
with X = x gives the same results as intervening to set X = jc. 

There are tests for exogeneity in the literature, as well as model 
specification tests. However, these have limited relevance to causal infer¬ 
ence. For example, Hausman [26] assumes that certain variables are known 
a priori to be exogenous and then tests whether other variables are 
exogenous; he interprets exogeneity as orthogonality to disturbance terms. 
He also has a test that detects correlation between errors from equations 
in a path model. White [69, 70] focuses on similar issues—for instance, 
testing whether the variables have a jointly normal distribution. 

Another reference in the econometric literature is Engle, Hendry, and 
Richard [15]. These authors distinguish several kinds of exogeneity; “strict” 
exogeneity means independence of variables and error terms, but only 
“super” exogeneity permits estimating the effects of interventions. Exam¬ 
ples are given to illustrate the definitions [15, pp. 287-294]. There is 
further discussion in Learner [35]. 


11.1, The Basic Statistical Problem 

Suppose you have n random variables with a jointly normal distribution; 
all the variables have mean 0, and you know the covariance matrix, which 
is positive definite. You wish to present this covariance matrix as a path 
model. In a sense, nothing is easier. Simply order the variables, arbitrarily, 
as X[, Xj,..., X„. By successively applying regression, we can find coeffi¬ 
cients fly and error terms such that X,, e,.e„ are all independent 


I 

I 

I 

I 

i 

I 

r 

I 

r 


I 

I 

I 

I 

I 

I 


i 


I 

! 



with mean 0, and Eq. (10) holds: 

X 2 = ^2 

~ ‘* 31-^1 * 32 -^ 

+ ■■■ 

Then X, is presented as exogenous a 
X 2 “cause” Xji and so forth. In shor 
covariance matrix as a path diagram; 
inference.^ 

11.2. The Faithfulness Assumption 

How can you single out one path 
spond to a given covariance matrix? A 
“faithfulness” assumption; this assiu 
founding, as discussed in Section 12.1 
is faithful to a diagram provided coi 
dencies are determined by the pro 
diagram, rather than specific numerk 
By way of example. Fig. 7 shows 
causes IF through the intervening var 
of causality is reversed.'*’ The lower 
“path coefficients,” that is, standard!/ 
SGS [62] distinguish between the tw 
seems to be as follows: 

In the left hand diagram, Y and Z arc 
the right, hewever. Y and Z arc conditi 

Another contrast: 

In the teft-iiund diagram, Y and Z arc i 
right, IiDwcvcr, Y and Z are conditiona 

For the construction in (10), simply chuosi 
so El.V-lA'i, A'j) = and so li 

ordering of the variables in CIO) is artiiUary, I 
cannot determine which variables are causes 1 
exogenous iu the .sense tliat it is statistically ii 
docs not suffice to estimate the results of niai 
is a cause or iin effect. 

In this section, t use "cause” in its ortlin 
technical point—.iboiit the jHtssihility of cstiii 
—still holds if file arrows arc interpreted as i 
then colorful shortiiand (perhaps too colorful 


PM3006509560 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 





^ FREEDMAN 


REGRESSION 


87 


stions of linearity etc. and all problems 
lounts of data. However, the statistical 
:e the SGS algorithms—will operate on 
hing else. Such procedures may be called 

action 6, Briefly, you start with variables 
linear combinations of variables at lower 
errors. In a path diagram, nodes repre- 
w from A' to y if AT is used as an 
on for Y. 

As indicated before, the term is used in 
t definition is purely mechanical; exoge- 
/ithin the model, but are supplied to the 
ith model arc exogenous in this minimal 
on: exogenous variables are statistically 
tile equations. The third idea is the one 
x: X is exogenous if selecting subjects 
as intervening to set X = x- 
y in the literature, as well as model 
; have limited relevance to causal infer- 
issumcs that certain variables are known 
hen tests whether other variables are 
ty as orthogonality to disturbance terms, 
rrelation between errors from equations 
focuses on similar issues—for instance, 
a jointly normal distribution, 
imetric literature is Engle, Hendry, and 
.uish several kinds of exogeneity; “strict” 
of variables and error terms, but only 
ting the effects of interventions. Exam- 
Jefinitions [15, pp. 287-294]. There is 


iables with a jointly normal distribution; 
you know the covariance matrix, which 
I resent this covariance matrix as a path 
r. Simply order the variables, arbitrarily, 
applying regression, we can find coeffi- 
, that X^, € 2 ,..., e„ are alt independent 




I 

I 

I 

I 

t 


i 

I 

I 

1 

1 


with mean 0, and Eq. (10) holds: 

^2 = ‘*21-^1 + 62 

4 “ (1^2X2 E 


+ -( 10 ) 

Then X^ is presented as exogenous and the “cause” of X 2 ; next, X^ and 
X 2 “cause” X^; and so forth. In short, there are many ways to present a 
covariance matrix as a path diagram; few if any will be relevant for causal 
inference.^ 

11.2. The Faithfulness Assumption 

How can you single out one path diagram from the many that corre¬ 
spond to a given covariance matrbi? At this point, SGS [62] seem to use the 
“faithfulness” assumption; this assumption is also used to handle con¬ 
founding, as discussed in Section 12.1 below. Basically, a covariance matrix 
is faithful to a diagram provided conditional dependencies and indepen¬ 
dencies are determined by the presence or absence of arrows in the 
diagram, rather than specific numerical values of parameters. 

By way of example. Fig. 7 shows two path diagrams. On the left, X 
causes W through the intervening variables Y and Z; on the right, the flow 
of causality is reversed.^^ The lower case letters on the arrows stand for 
“path coefficients,” that is, standardized regression coefficients. How could 
SGS [62] distinguish between the two theories in the figure? Their idea 
seems to be as follows: 

In the left hand diagram, Y and Z are conditionally independent given X; on 
the right, however, V and Z are condifionaliy dependent given X. 

Another contrast; 

In the left-hand diagram, V and Z are conditionally dependent given W-, on the 
right, however, Y and Z are conditionally independent given W. 


For the construction in (10), simply choose so ElX 2 \X^} = a 2 ^X^; choose n,| and 
so E(^ 3 |A’i,A 2 l + Oil ^21 ^"4 SO forth. For details, see the Appendix. Since the 

ordering of the variables in (10) is arbitrary, fitting such equations or drawing path diagrams 
cannot determine which variables are causes and which are effects. In particular, may be 
exogenous in the sense that it is statistically independent of disturbance terms; that by itself 
does not suffice to estimate the results of manipulating X^, since we cannot tell whether 
is a cause or an effect. 

In this section, I use “cause” in its ordinary (perhaps undefinable) sense. However, the 
technical point—about the possibility of estimating path diagrams from covariance matrices 
—still holds if the arrows are interpreted as merely representing associtttion, “Causation” is 
then colorful shorthand (perhaps too colorful) for a certain kind of covariation. 


PM3006509561 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 












88 


DAVID FREEDMAN 


REGRES; 


computed from the four parameters a. 


Fia. 7. If two path diagrams have the same covariance matrix, correlational methods 
cannot tell them apart; the faithfulness assumption is made to rule out such problems. The 
lower case letters on the arrows denote ‘‘path coefficients,” that is, standardized regression 
coefficients. 


Therefore, the pattern of conditional dependence and independence iden¬ 
tifies the diagram. (In both diagrams, X and W are conditionally indepen¬ 
dent given Y and Z.) 

This idea works for many path diagrams, but fails for others. Indeed, the 
path coefficients can be chosen so the pattern of conditional dependence 
and independence is the same in the two diagrams. Even worse, both 
diagrams can give rise to the same covariance matrix—so correlational 
methods cannot tell which is right. SGS [62] make the “faithfulness 
assumption” in order to rule out such indeterminacies. (The workings of 
the assumption will be explained below.) 

However, that only moves the difficulty to another place. Faithfulness is 
hardly an empirical fact; it is an assumption about unobservables, made to 
rule out situations that cannot be handled by correctional methods. The 
SGS analytical program can now be stated rather simply. If the arrows in a 
path diagram represent causation not association, and if the path diagram 
can be estimated from data, then SGS can indeed infer causation from 
association. 

The balance of Section 11.2 provides technical backup; readers can skip 
to Section 11.3. The left-hand panel in Fig. 7 is described by 

Y=aX+Sj, Z^bX+S 2 , W=cY+dZ+8i. (11) 

In this equation, X, 5,, 5,, 5, are independent and normal, with mean 0; 
X, Y, Z, W ail have variance 1. The covariance matrix of X, Y, Z, W can be 



X 

Y 

X 

1 

a 

y 

a 

1 

z 

b 

ab 

w 

\ ac + bd 

c + abd 


It is a little theorem, which follows 1 
the Appendix, that 

cov( X,W\Y 

This is an example of a conditional 
graph; (13) holds whatever the path co 
The diagram on the left in Fig. 7 is 

cov(y,Z|i 

By (48) below, Eq. (14) is equivalent t( 
cov(y, z) = cov(y. 
By (12), this means 

ab = {c + abd 
Rearranging (16) gives the quadratic e 
cd(ab)^ — (1 — - 

One solution to (17) is 




S I cho,se rt, c, d more or less at r; 
0.1245, respectively.-^’ I computed b fi 
forces the conditional independence u 
ness assumption; conditional indcpei 
values, not the presence or absence os 

■' Thitre was a bit of luck liere, because some 
matrices. 


PM3006509562 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 




) FREEDMAN 


^ same covariance matrix^ correlational methods 
sumption is made to rule out such problems. The 
rath coefficients,” that is, standardized regression 


nal dependence and independence iden- 
ms, X and W are conditionally indepen- 

'iagrams, but fails for others. Indeed, the 
) the pattern of conditional dependence 
n the two diagrams. Even worse, both 
nc covariance matrix—so correlational 
ght. SGS [62] make the “faithfulness 
such indeterminacies. (The workings of 
lelow.) 

ifficulty to another place. Faithfulness is 
.sumption about unobservables, made to 
• handled by correctional methods. The 
e stated rather simply. If the arrows in a 
not association, and if the path diagram 
1 SGS can indeed infer causation from 

vides technical backup; readers can skip 
el in Fig. 7 is described by 


:+ 


W=cY+dZ+ 5,. 


independent and normal, with mean 0; 
; covariance matrbe of X, Y, Z, W can be 


II 
f I 
I I 
f I 
f I 
I I 
f I 
I I 

I I 

II 

I I 

II 
11 
r 1 


REGRESSION 


computed from the four parameters b, c, d as shown in (12): 



X 

Y 

Z 

W 

X 

1 

a 

b 

ac + bd 

Y 

a 

1 

ab 

c + abd 

Z 

b 

ab 

1 

d + abc 

w 

ac 4- bd 

c + abd 

d + abc 

1 


r 1 

^ I m 


It is a little theorem, which follows by a tedious calculation from (48) in 
the Appendix, that 

cov{X,W\Y,Z) = 0. (13) 

This is an example of a conditional independence relation forced by a 
graph; (13) holds whatever the path coefficients in Fig. 7 may be. 

The diagram on the left in Fig. 7 is reversible, provided 

cov(r, Z[IF) --0. (14) 

By (48) below, Eq. (14) is equivalent to 

cov(F,Z) = cov(y, JF) X cov(Z,W). (15) 

By (12), this means 

ab = (c + abd){d + abc). (16) 

Rearranging (16) gives the quadratic equation 

cd(ab)~ — (1 — c^ — d^)ab + cd = 0. (17) 

One solution to (17) is 

1 - c- - d- - ^/(l - ~ d-y - Ac-d- 


I chose a, c, d more or less at random, getting 0,1925, 0.2873, and 
0.1245, respectively."’ I computed b from (18), getting 0.2063. This choice 
forces the conditional independence relation (14) and violates the faithful¬ 
ness assumption; conditional independence comes from the parameter 
values, not the presence or absence of arrows. 

There was a bit of luck, here, because .some values for a, c, d will not produce correlation 
matrices. 


PM3006509563 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 





90 


DAVID FREEDMAN 


REGRESS 


Given the values for the four parameters a, b, c, d, the covariance 
matrix (12) can be evaluated as 


'1.0000 0.1925 

0.1925 1.0000 

0.2063 0.0397 

,0.0810 0.2922 


0.2063 0.0810’ 

0.0397 0.2922 

1.0000 0.1359 

0.1359 1.0000, 


The path coefficients in the right-hand panel of Fig. 7 are easily 
computed from (19): 

the path coefficient from IT to T is c' = covlT, W) = 0.2922; 
the path coefficient from W to Z is d' = cov(Z, W) = 0,1359; 
the path coefficients from Y and Z to X are obtained by multiple 
regression, as o' = 0.1846 and b' = 0.1990. 

With these choices, faithfulness does not hold and (19) can be represented 
by either diagram in Fig. 7. (For details on multiple regression, see the 
Appendix.) In effect, the faithfulness assumption precludes certain alge¬ 
braic identities among the parameters, like (16). Since parameters are not 
observable, the faithfulness assumption is not subject to direct empirical 
tests based on finite amounts of data. 

11.3. Complete Graphs 

Even if the covariance matrix is faithful to a graph, however, problems 
of indeterminacy remain—particularly if the graph is “complete” in the 
sense that every pair of vertices is joined by an arrow. Figure 8 illustrates 
this indeterminacy. The same covariance matrix (20) for the variables 


Fig. 8. Graphs (a) and (b) have the same covariance matrix. Both are complete; there is 
an arrow from every variable to every other variable. The numbers on the arrows are path 
coefficients, that i.s, standardized regression coefficients. 


X,Y,Z is represented either by the di 
panel (b), where the flow of “causality” 


X 

Y 

X 1 

.4 

Y .46 

1 

Z .50 

.4 


For a second example of indetermii 
consider four variables X, Y, Z, W with 


Figure 9 shows two complete path di 
blc with the given covariance matrix. I 
nous and "causes” Y; then X and Y“c; 
In panel (b), the flow of “causality” is r 
ing to the left-hand panel are given as 

y= 6, 

Z = IA -I- yY 4 

1Y= 

Z- jW+ 

Y = - 

The covariance matrix S is also c 
model (24). where the unobservable c: 
ob.serv'abies (right-hand panel of Fig. 9 


X = U + 


U+ f,, 


t in each system of Eqs. (22)-(24), l! 
independent and normally distributed 
pendent of the e.xogcnous variable. A, 
matrix (20) i.s faithfully represented by 


PM3006509564 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 





FREEDMAN 


REGRESSION 


91 


parameters a, b, C, d, the covariance 


i 0.2063 

0.0810' 



) 0.0397 

0.2922 


(19) 

' 1.0000 

0.1359 


’ 0.1359 

1.0000 j 



ight-hand 

panel of Fig. 7 are 

easily 

1 y is c' = 

cov(y, W) = 0.2922; 



) Z is d' = coviZ, W) = 0.1359; 

and Z to X are obtained by multiple 
b' = 0.1990. 

;s not hold and (19) can be represented 
details on multiple regression, see the 
ess assumption precludes certain alge- 
ers, like (16). Since parameters are not 
ttion is not subject to direct empirical 
ta. 


faithful to a graph, however, problems 
arly if the graph is “complete” in the 
joined by an arrow. Figure 8 illustrates 
ariance matrix (20) for the variables 


b 


.37 


X< -Z 



Y 


c covariance matrix. Both are cornpletc; there is 
r variable. The numbers OH the arrows are path 
coefficients. 



X,Y,Z is represented either by the diagram in panel (a) or the one in 
panel (b), where the flow of “causality” is reversed; 


1 

X 

y 

z 


1 

.46 

.50 

y 

,46 

1 

.42 

^ , 

.50 

.42 

1 


( 20 ) 


For a second example of indeterminacy when the graph is complete, 
consider four variables X, Y, Z, W with covariance matrix 2 given by 


(1 


2 = 


i 

4 

3 

4 
3 



( 21 ) 


Figure 9 shows two complete path diagrams, both of which arc compati¬ 
ble with the given covariance matrbe. In the left-hand panel, X is exoge¬ 
nous and “causes” Y; then X and F “cause” Z; finally, X, Y, Z “cause” W. 
In panel (b), the flow of “causality” is reversed. The equations correspond¬ 
ing to the left-hand panel are given as (22); panel (b) is described in (23); 

F= ix+ S, 

Z^^x+ ^Y+ §2 

53 ; (22) 

Z= |JF+ Cj 

fz -F fiF-l- e, 

X= ^Y+ iZ +iW+e,. (23) 


The covariance matrix 2 is also compatible with the factor analysis 
model (24), where the unobservable exogenous variable U causes all four 
observables (right-hand panel of Fig. 9): 

X=U+^i, y=(/+^2, Z=U+^i, IF= t/-l-^4- (24) 

In each system of Eqs. {22)-(24), the error terms are assumed to be 
independent and normally distributed with mean 0; error terms arc inde¬ 
pendent of the exogenous variable. As a technical matter, the covariance 
matrix (20) is faithfully represented by both graphs in Fig. 8. Likewise, the 


«i 


PM3006509565 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 










Fig. 9. Two complete path diagrams and a factcw analysis model, all having die same 
covariance matrix. 


covariance matrix (21) is faithful to Fig. 9(a) and to 9(b). Proofs may be 
based on (48) below. 

To sum up, if a covariance matrix is faithful to a complete graph (with 
all pairs of vertices joined by arrows), it is faithful to many such graphs. 
Then correlational methods cannot tell the causes from the effects. SOS 
[62] techniques work best when the graph is sparse; that is, relatively few 
pairs of vertices are joined by arrows (Section 6). 

11.4, Identifiability and Consistency 

The focus continues to be on linear models. In statistical terminology, 
models are “identifiable” when they make different predictions about 
observables. For example, suppose you have two models for your data. If, 
for all data sets. 


P(data|model 1) = P(datalmodel 2), 

there is an obvious problem—the data cannot distinguish between the 
models. If a path model is complete, or the faithfulness assumption is not 
imposed, then the graph underlying a covariance matrix is not identifiable; 
that is, the message of Sections 11.1-11.3. By way of illustration, the 
models in Fig. 7 are identifiable only if faithfulness holds. 

Flowever, even if we assume that a covariance matrix is faithful to a 
graph that is not complete, there may be several Such graphs [62, p. 89]. 
For example, the following three graphs can generate the same covariance 
matrix: 

X-^Y^ Z, X^Y^Z, X^Y^Z. 

Thus, SGS do not seem to have succeeded in defining a class of graphs and 
covariance matrices for which identifiability holds [62, p. 194]. 


I 

E 

E 

E 

I 

E 


0 

0 

D 

S 

B 

fl 





REGRl 

In statistical terminology, estimato 
the sample gets larger and larger, thi 
to the population parameters. If t 
however, consistency is problematic. 

SGS [62] seem to claim that the 
diagrams compatible with a given cc 
rems suggest that the algorithms will 
seem to claim that their algorithms 
identifiability theory for linear modci 
consistency. 

Statisticians do have the weaker n 
after R. A. Fisher: when applied ti 
estimator should reproduce the popi 
like 5.1 in SGS [62, p, 405] seem i 
consistency, rather than anything stn 
the population covariance matrix, tb 
consistent with that matrix. 


11.5. Methodological Contributions 

There is a connection between th 
(DAGs) and the conditional indep 
Darroch et al. [10], Kiiveri and Spec 
Pearl [49, 67], Geiger [22].) Much t 
However, the mathematics of nonl 
irrelevant to the big question: how d< 

Most the applications in SGS are ! 
“nonlinear causal diagrams” turn on 
gorical data; examples are in [62, pp. 
are quite similar to those for linear t 
are different. 

This section will focus on path 
SGS approach to estimation. SUppOj 
and wish to estimate the model. Con 

Case [. You know the classific. 
you know which variables are at Icve 

Case II, You do not know the 

In Case 1, SGS [62] have little 
confounding, see Section 12.1. Sen 
equivilicnt to regression; others may 


PM3006509566 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 











FREEDMAN 


REGRESSION 


93 


X Y Z W 


id a factor analysis model, all having the same 


) Fig. 9(a) and to 9(b). Proofs may be p 

X is faithful to a complete graph (with _ 
vs), it is faithful to many such graphs. B 

; tell the causes from the effects. SGS ^ 

: graph is sparse; that is, relatively few 
vs (Section 6). C 


tear models. In statistical terminology, K 
tey make different predictions about PI 
you have two models for your data. If, 


= F(datalmodel 2), 

data cannot distinguish between the 
i, or the faithfulness assumption is not 
a covariance matrix is not identifiable; 
11.1-11.3. By way of illustration, the 
iy if faithfulness holds. 

It a covariance matrix is faithful to a 
nay be several such graphs [62, p. 89]. 
rphs can generate the same covariance 


eeded in defining a class of graphs and 
ifiability holds [62, p. 194]. 


I :i 

I 1 

l‘ -1 
I' 1 


In statistical terminology, estimators are “consistent,” provided that, as 
the sample gets larger and larger, these estimators come closer and closer 
to the population parameters. If the parameters arc not identifiable, 
however, consistency is problematic. 

SGS [62] seem to claim that their algorithms will find all the path 
diagrams compatible with a given covariance matrix. However, the theo¬ 
rems suggest that the algorithms will at best find one such graph. SGS also 
seem to claim that their algorithms are consistent. However, without an 
identifiability theory for linear models, they cannot really be talking about 
consistency. 

Statisticians do have the weaker notion of “Fisher consistency,” named 
after R. A. Fisher: when applied to data for the whole population, an 
estimator should reproduce the population parameters exactly. Theorems 
like 5.1 in SGS [62, p. 405] seem to demonstrate the analog of Fisher 
consistency, rather than anything stronger. Such theorems show that, given 
the population covariance matrix, the algorithms will produce one graph 
consistent with that matrix. 


11.5. Methodolo^cal Contributions 

There is a connection between the theoiy of “directed acyclic graphs” 
(DAGs) and the conditional independence of random variables. (See 
Darroch et al. [10], Kiiveri and Speed [34, 61], Pearl [43, 44], Verma and 
Pearl [49, 67], Geiger [22].) Much of this work is reviewed in SGS [62]. 
However, the mathematics of nonlinear causal diagrams seems to be 
irrelevant to the big question: how do we infer causation from association? 

Most the applications in SGS are linear, i.e., based on path models. The 
“nonlinear causal diagrams” turn out to be multinomial models for cate¬ 
gorical data; examples arc in [62, pp. 147-151]. The issues about causation 
are quite similar to those for linear models, although the technical details 
are different. 

This section will focus on path models. To describe the novelty in the 
SGS approach to estimation, suppose you have data from a path model 
and wish to estimate the model. Consider two cases: 

Case 1. You know the classification of variables as to level; that is, 
you know which variables are at level 0, which arc at level 1, and so forth. 

Case II. You do not know the classification of variables as to level. 

In Case I, SGS [62] have little to tell us about estimation; as to 
confounding, see Section 12.1. Some of their algorithms seem to be 
equivalent to regression; others may be less efficient. In Case II, SGS try 


PM3006509567 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 



DAVID FREEDMAN 



94 


to estimate the classification of variables as well as the path coefficients. 
That is the methodological contribution. To estimate the classification, 
SGS must impose the faithfulness assumption (Section 11.2). It is disap¬ 
pointing that SGS do not pin do>yn the sense in which their algorithms are 
successful (Section 11.4). 


12. MORE EXAMPLES AND SOME THEORY 

Section 12.1 explains how the faithfulness assumption and conditional 
independence are supposed to eliminate confounding. Section 12.2 dis¬ 
cusses omitted variables. Sections 12.3-12.5 revisit two examples from a 
more mathematical perspective; the idea is to show the limits of correla¬ 
tional methods. 

12.1. Faithfulness, Conditional Independence, and Confounding 

The problems created by unobservable variables are well known. As 
indicated above, SGS [62] handle such problems by imposing the faithful¬ 
ness assumption. More specifically, the assumption is used to rule out 
confounding. If confounding can be eliminated, the goal is in sight— 
association may soon be converted into causation. This section, which is 
based on work by Jamie Robins (personal communication), examines the 
logic in more detail. Adso see Pearl and Verma [49]. 

With some models, exact conditional independence forces a choice: 

• either there is no confounding by unmeasured common causes, 

• or the faithfulness assumption is violated. 

Near-independence is not good enough; associations may then be entirely 
spurious. Thus, causal inferences made by the SGS technique need exact 
conditional independence as well as the faithfulness assumption. 

This use of the faithfulness assumption has some theoretical interest. 
However, in order to base empirical work on such mathematical ideas, it 
would seem necessary to resolve the following questions, which SGS have 
not addressed: 

• Can the basic models be validated? 

• Can exact conditional independence be demonstrated? 

• Given exact independence, why is exact cancellation of confounded 
effects overwhelmingly less likely than the total absence of Such effects? 

As a practical matter, exact independence seems quite unusual. How¬ 
ever, the theory is worth understanding, and an example will make the 


I 

I 

I 



I 



E 

I 

E 

I 



I 

I 

I 

I 

1 

I 

I 

1 

I 

I 

I 

I 

J 

1 

I 



Fig. 10. Tire faithfulness assumption, com 
ables X, Y, Z are observable; U is unobsei 
association. The lower-case letters on the art 
eient vanishes, the corresponding arrow must 


position clearer. Figure 10 shows a r 
fulness and conditional independenc 
arrows denote causation, not mere 
observable; U is unobservable. Such 
founders” or “unmeasured commo 
normal, and variables are standardize 
covariance matrix for all four variabl 


U 

X 

Y 

Z 


u 


1 

d 

e 

f + ad -t be 


X 


1 

de 

a + bd 


Of course, only the covariance ma 
can be estimated from the data. In ; 
ob.servables, as cov(2k', T): 



Y 

X 

1 

y 

de 

^ I 

a + bde + fd 


It may help to review the idea ( 
example. Faithfulness is an assumptii 


Covuriiincc matrices are .symmetric; only 
iissiimed to be positive but less than 1. The m 


PM3006509568 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 




■ FREEDMAN 


REGRESSION 


95 


riablcs as well as the path coefficients. 
ibutLon. To estimate the classification, 
assumption (Section 11.2). It is disap- 
I the sense in which their algorithms are 


iS AND SOME THEORY 

aithfulness assumption and conditional 
iminate confounding. Section 12.2 dis- 
12.3-12.5 revisit two examples from a 
e idea is to show the limits of correla- 


ependence, and Confounding 

scrvablc variables are well known. As 
.uch problems by imposing the faithful- 
y, the assumption is used to rule out 
be eliminated, the goal is in sight— 
1 into causation. This section, which is 
personal communication), examines the 
1 and Verma [49], 

onal independence forces a choice: 
ng by unmeasured common causes, 
m is violated. 

jugh; associations may then be entirely 
nade by the SGS technique need exact 
s the faithfulness assumption. 

.iraption has some theoretical interest, 
al work on such mathematical ideas, it 
tc following questions, which SGS have 


idated? 

cndence be demonstrated? 

vhy is exact cancellation of confounded 
'tan the total absence of such effects? 

cpcndence seems quite unusual. How- 
mding, and an example will make the 



U 



Fig. 10. The faithfulness assumption, conditional independence, and confounding. Vari¬ 
ables X,Y,Z are observable; U is unobservable. Arrows represent causation, not just 
association. Tlte lower-case letters on the arrows denote path coefficients. If a path coeffi¬ 
cient vanishes, the corresponding arrow muse be deleted. 


position clearer. Figure 10 shows a relatively simple diagram where faith¬ 
fulness and conditional independence would eliminate confounding. The 
arrows denote causation, not mere association. Variables X, Y, Z are 
observable; U is unobservable. Such unobservables are also called “con- 
founders” or “unmeasured common causes.” The joint distribution is 
normal, and variables are standardized to have mean 0 and variance 1. The 
covariance matrix for all four variables is shown in (25).^® 



u 

X 

Y Z 

u 

1 



X j 

d 

1 


Y ' 

e 

de 

1 


f ad + be 

a + bde + fd 

b + ade + fe 1 


Of course, only the covariance matrix (26) of the observables {X,Y,Z) 
can be estimated from the data. In particular, de is determined from the 
observables, as cov(Y, Y): 



X 

Y Z 

X 

1 


Y 

de 

1 

Z 

a + bde -1- fd 

b + ade + fe I 


(26) 


It may help to review the idea of faithfulness, in the context of our 
example. Faithfulness is an assumption about unobservables; more specifi- 


’* Covariance matrices are symmetric; oniy the lower triangular part is .shown. Entries are 
assumed to be positive but less than 1, The matrix is assumed to be positive definite. 


PM3006509569 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 







II 


96 • DAVID FREEDMAN 

cally, it is a constraint on the relationship between the full covariance 
matrix (25) and the graph in Fig. 10. The assumption amounts to this: 
independence relationships (conditional and unconditional) are deter¬ 
mined by the presence or absence of arrows in the diagram, not specific 
parameter values. 

In particular, if the covariance matrix (25) is faithful to the diagram in 
Fig. 10, you cannot set any of the path coefficients to 0, except by deleting 
the corresponding arrow. An arrow from X to Z, say, entails that X has 
some causal effect on Z, no matter how small that effect may turn out to 
be. 

I return to more conventional issues. In our example, the parameter of 
interest is b, the causal effect of Y on Z. Due to the unmeasured 
confounder U, a regression of Z on X and Y produces a biased estimate 
of b. By a slightly tedious calculation, the coefficient of Y in the regres¬ 
sion equation is 

b + fe{\ - d-)/(l -- d~e^). (27) 

(For details on multiple regression, see the Appendix.) The bias in the 
regression estimate is the second term in (27). From a slightly different 
perspective, cov(y, Z) in (26) measures the total association between Y 
and Z. Part of this association is real: b measures the causal effect of Y 
on Z. Alas, part of the association is spurious: ade + fe represents the 
effects of the confoimder U. 

The goal is to separate the real part of the association from the spurious 
part. The familiar obstacle is that we have only (26), not (25). And (26) 
does not suffice to separate b + ade + fe into its components. But, SGS 
might say, suppose that X and Z are conditionally independent given Y: 

cov(X,Z|Y) =0. (28) 

By (48) below, this means 

cov(X,Z) = cov(X, Y) X cov(Y,Z). (29) 

A bit of algebra based on (25) shows that (29) is equivalent to 

a{l - dV-) + df = de^-f. (30) 

Although de is known and 0 < de < 1, there are many possible ways to 
solve Eq. (30). At this point, SGS would invoke the faithfulness assump¬ 
tion, concluding that 

a = 0, /-O. (31) 

The implication is that we have to remove the arrow from X to Z, as well 
as the arrow from U to Z. 


II 
f I 
I I 
f I 
I I 

I I 

II 

I .1 

II 

I I I 

I I 

II 
I I 
I 1 


REGR: 

Confounding has now been climin; 
whole of the association is real, ai 
estimate for the causal effect of Y 
converted into causation. (Of course. 
Fig. 10 from the beginning—-by assu, 

Those were the implications of ex: 
other hand, suppose we have apf 
cov(X, Z|y) = .00001. Now the fa 
Given the covariances in (26), we car 
other parameters, even if a = b = 0 

With approximate conditional ind 
be entirely spurious. Thus, even in t 
and conditional independence precli 
pendence is exact. To make the coi 
ness: 

• If cov(X, Z|Y) = 0, then the ; 
causal; the effects of the unmeasun 
the relationship between Y and Z. 

. Ifco\(X,Z|Y) = .00001, then 
causes may account for all of the ol 

A similar problem must be cons 
from data (Section 11). Exact condii 
faithfulness assumption, often permi 
the covariance matrix. However, apf 
not enough; then, the covariance r 
complete graphs. 

A final example is the Timberlai 
model explains political exclusion (P 
energ}' development (EN), and civil 
matrix was shown in Table VI. Co 
correlation matrix p: 

(i) Suppose p happens to < 
Then, faithfulness obtains. 

This matching assumes, for instance, 
covatiance, given [hi; third. To avoid viohuin,: 
to 0, cruse Ihc corresponding arrows; if that 
vaiucs. Tile SGS logic would apply to a wide 
to X, no matter Itow sniail the coefficient. s[ 


PM3006509570 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 






FREEDMAN 

!ationship between the full covariance 
10. The assumption amounts to this: 
itional and unconditional) are deter- 
of arrows in the diagram, not specific 

natrix (25) is faithful to the diagram in 
>ath coefficients to 0, except by deleting 
! from X to Z, say, entails that X has 
r how small that effect may turn out to 

sues. In our example, the parameter of 
if Y on Z. Due to the unmeasured 
n X and Y produces a biased estimate 
ion, the coefficient of Y in the regres- 

/^)/(l~dV). (27) 

n, see the Appendix.) The bias in the 
term in (27). From a slightly different 
isures the total association between Y 
■eal: b measures the causal effect of Y 
■n is spurious; ade + fe represents the 

lart of the association from the spuriou:! 
we have only (26), not (25). And (26) 
■dc + fe into its components. But, SGS 
are conditionally independent given Y; 

.Z|Y)=0. (28) 


(X, Y) X cov( Y,Z). (29) 

ws that (29) is equivalent to 

')+df-de-f. (30) 

le < I, there are many possible ways to 
would invoke the faithfulness assump- 

. /=0. (31) 

remove the arrow from X to Z, as well 


El 
El 
II 
l| E 

I I 

II 
i!i 

e|j 

II 
II 
I j 

r I 

I j 

I I 

II 


REGRESSION 97 

Confounding has now been eliminated. On this basis, cov( Y, Z) = b, the 
whole of the association is real, and regression produces an unbiased 
estimate for the causal effect of Y on Z. At last, association has been 
converted into causation. (Of course, quite a lot of causality was built into 
Fig. 10 from the beginning—by assumption.) 

Those were the implications of exact conditional indepcntience. On the 
other hand, suppose we have approximate conditional independence: 
cov(2f, Z|Y) = .00001. Now the faithfulness assumption has no force. 
Given the covariances in (26), we can match them by suitable choice of the 
other parameters, even if n = fe = 0.^’ 

With approximate conditional independence, observed associations can 
be entirely spurious. Thus, even in the realm of mathematics, faithfulness 
and conditional independence preclude confounding only when the inde¬ 
pendence is exact. To make the contrast sharper, let us assume faithful- 


• If cov(W, Z|Y) = 0, then the association between Y and Z is purely 
causal; the effects of the unmeasured common cause U do not confound 
the relationship between Y and Z. 

• If cov(A'', Zj Y) = .00001, then confounding by unmea.sured common 
causes may account for all of the observed association between Y and Z. 

A similar problem must be considered when estimating path models 
from data (Section 11). Exact conditional independence, together with the 
faithfulness assumption, often permits us to identify the path diagram from 
the covariance matrix. However, approximate conditional independence is 
not enough; then, the covariance matrix will be faithful to a variety of 
complete graphs. 

A final example is the Timberlake-Williams model (Section 10). This 
model explains political exclusion (PO) in terms of foreign investment (FI), 
energy development (EN), and civil liberties (CV); the sample correlation 
matrix was shown in Table VI. Consider three scenarios for the "true” 
correlation matrix p: 

(i) Suppose p happens to equal the sample correlation matrix. 
Then, faithfulness obtains. 


This matching assumes, for in.stance, that any two of the variables have positive 
covariance, given the third. To avoid violating the faithfulness assumption, if you set a and b 
to 0, erase the corresponding arrows; if that is distasteful, set a and h to small but positive 
values. The SGS logic would apply to a wide variety of diagrams; however, an arrow from Y 
to X, no matter how smali the coefficient, spoils the show. 


PM3006509571 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 





(ii) Suppose the true correlation p(PO,FI) between foreign invest¬ 
ment and political exclusion happens to vanish exactly. Then, the 
Timberlake—Williams model violates the faithfulness condition; presum¬ 
ably, that is SOS’s real complaint. 

(iii) If p(PO, FI) — .00001, faithfulness is restored. According to the 
SGS criteria, Timberlake and Williams are back in business. 

Within the framework of path models, scenario (ii) cannot be rejected at 
conventional significance levels; neither can (iii); and (i) represents our 
best estimate, subject to large uncertainties. SGS seize on hypothesis (ii), 
the only one that legitimates their critique. They are balking at shadows. 

12.2. Omitted Variables 

The problem of omitted variables was raised by Cliff Clogg at the Notre 
Dame conference, and this section paraphrases one of his points. There is 
a response variable Y, with explanatory variables X and Z; these may be 
construed as vectors. Suppose the data are generated according to the 
“true” model (32T): 


y = + 8 . 


(32T) 


I 

I 

I 

I 

I 

I 

f 



3 

3 


by cither model. Tnerefore, no statist 
can tell you whether the restricted mi 

12.3. Ofi the Directzon of Causality 

This section uses "cause” in its ordi 
not as shorthand for certain kinds of 
example, shown in Fig. 2(a). Given th 
the SGS [62] algontnm will produce 
tell the algorithm that omitted varia: 
that Y cannot cause X or Z. 

In the example. X. Y, and Z a 
covariance matrix is faithful to the i 
information cannot by itself determin 
substantiate this claim. I now constri 
ables X, Y, and Z '.vill have the sai 
graph in Fig. 2(a‘. However, the d 
different in the two theories. 

Theory l. 1 firs: generate X, Z, I 
i.s an unobservable error term. (If yoi 
Z, now is your moment.) Then 


Y=^X + 


The parameter vectors j3 and y are unknown and to be estimated from 
data by regression; it is /3 that is of primary interest. Subjects are assumed 
to be independent and identically distributed; (X, Z} and the error term e 
are independent and jointly normal; all variables have expected value 0. 
Consider, too, the “restricted” model (32R), where is defined so that 
E[YiX} = X/3r; the constituents of (32R) may be computed from the true 
model.^® 


Y^X^ + Zy+ €. 


(32R) 



According to Theory 1, X and Z c 

Theory 2, 1 first generate y as / 

change Y, now is yuur moment.) Ai 
arrow will delineate the flow of causa, 
us independent .V(0, -j) variable! 
according to 

iy + 
z = iy+ 


In principle, the variables X, Y, and Z are all observable; X and Z may 
be correlated. However, Investigators who do not know that Z is relevant 
may fit the restricted model R rather than the true model T. If so, the 
estimate of fi can be quite biased. In the vernacular, /3 r includes the 
effect of T on y through Z. The covariance matrix of lx,Y} cannot 
distinguish between the two models, because the matrix can be generated 

™ Indeed, — p + a, where a is obtained by the regression of Zy on X. In other terms, 
Zy = Xa + n, where -q is normal with mean 0, independent of X. Then S = e + tj. It may 
be seen that a depends linearly on y. 



i/= iy + 

In the second theory, Y causes X 
concerned—namely, the joint distril 
and 2 agree. Furthermore, the joint t 

See Clogg and Haritou [6], who make 
variables that ate eorreiated with e can aiso b 
bias can be just as troublesome as the mori 
problem cannot he solved by throwing varia 
oniitled variables was discussed in Section 12.! 


PM3006509572 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 






i FREEDMAN 


REGRESSION 


99 


ation p(PO, FI) between foreign invest- 
appens to vanish exactly. Then, the 
ites the faithfulness condition; presum- 

aithfulncss is restored. According to the 
iams are back in business. 

iodels, scenario (ii) cannot be rejected at 
leither can (iii); and (i) represents our 
rertainties. SGS seize on hypothesis (ii), 
■ critique. They are balking at shadows. 


;s was raised by Cliff Clogg at the Notre 
paraphrases one of his points. There is 
atory variables X and Z; these may be 
i data are generated according to the 


+ 5 . 


(32T) 


are unknown and to be estimated from 
f primary interest. Subjects are assumed 
listributcd; {X, Z) and the error term e 
ai; all variables have expected value 0. 
del (32R), where is defined so that 
f (32R) may be computed from the true 


I + Zy + e. 


(32R) 


and Z are all observable; X and Z may 
us who do not know that Z is relevant 
ther than the true model T. If so, the 
d. In the vernacular, includes the 
le covariance matrix of iX,Y) cannot 
s, because the matrix can be generated 

ied by the regression of Zy on X, In other terms, 
in 0, independent of X. Ttieii 5 = e + 17 . It may 



by either model. Therefore, no statistical procedure based on that matrix 
can tell you whether the restricted model is right or wrong.^^ 

12.3. On the Direction of Causality 

This section uses “cause” in its ordinary (perhaps undefinable) meaning, 
not as shorthand for certain kinds of covariation. 1 return to Judea Pearl’s 
example, shown in Fig. 2(a). Given the covariance matrix for X, Y, and Z, 
the SGS [62] algorithm will produce the graph shown in panel (a). If you 
tell the algorithm that omitted variables are a possibility, it will tell you 
that Y cannot cause X or Z. 

In the example, X, Y, and Z are the only observables, and their 
covariance matrix is faithful to the graph in Fig. 2(a). I claim that such 
information cannot by itself determine the direction of the causal flow. To 
substantiate this claim, I now construct two theories. In both, the observ¬ 
ables X, Y, and Z will have the same covariance matrix, faithful to the 
graph in Fig. 2(a). However, the direction of the causal flow will be 
different in the two theories. 

Theory 1. I first generate X, Z, U as independent iV(0,1) variables; U 
is an unobservable error term. (If you want to intervene and change X or 
Z, now is your moment.) Then 

Y^X + Z + U. (33) 

According to Theory 1, X and Z cause Y, as suggested by Fig. 2(a). 

Theory 2. I first generate Y as iV(0,3). (If you want to intervene and 
change Y, now is your moment.) After a suitable pause, so that time’s 
arrow will delineate the flow of causality, I generate the errors K,, V^, and 
Fj as independent iV(0, |) variables and then produce X, Z, and U, 
according to 

^ = jZ + - Fj 

Z = jY+ V^- V 3 

17 = Fj - Fj. (34) 

In the second theory, Y cause.s X and Z. As far as the observables are 
concerned—namely, the joint distribution of X, Y, and Z—Theories 1 
and 2 agree. Furthermore, the joint distribution is faithful to the graph in 

See Clogg and Haritou [ 6 ], who make the following very interesting point. Adding 
variables that are correlated with e can also bias the estimate of /3; this “included variable” 
bias can be just as troublesome as the more familiar “omitted variable” bias; the latter 
problem cannot be solved by throwing variables into the model. The SGS treatment of 
omitted variables wag discussed in Section 12,1, 


PM3006509573 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 



100 


DAVID FREEDMAN 


REGRET 


Fig. 2(a). But the direction of causality is determined neither by the data 
nor by the mathematics. With correlational methods, causality follows 
from the assumptions about the unobservables. 

12.3. The AFQT Problem 

SGS [62, p. 242] seem to claim that, as a demonstrable mathematical 
fact, their procedures will find the right answers; 

Assuming the right variables have been measured, there is a straightforward 
solution to these problems: apply the PC, FCI, or other reliable algorithm, and 
appropriate theorems from the preceding chapters, to determine which X 
variables influence the outcome y, which do not, and for which the question 
cannot be answered.... Tlien estimate the dependencies by whatever methods 
seem appropriate and apply the results of the previous chapter to obtain 
predictions of the effect of manipulating the X variables. No extra theory is 
required. We will give a number of llluscrations .... 

The first example given by SGS to illustrate this claim is AFQT (Section 
9 above). To demonstrate that SGS are exaggerating more than a little, I 
pose a sharp mathematical question with the essential features of the 
AFQT problem. Then, I show the question to be undecidable by correla¬ 
tional methods. (Of course, when applied to the real example, both SGS 
and ordinary least squares made the right guess.) 

To set up the question, assume that X and Y are random variables; X is 
a vector; Y is scalar. 

y is a linear combination of X's, with fixed weights. (35) 
The observables are Y and ,..., F,. (36) 

Some F’s are Ws; some F’s are ringers. (A “ringer” is a variable that does 
not enter into the linear combination for Y.) There are also unobservables, 
including the 2f’s that are not V’s. Assume too that 

The full joint distribution is multivariate normal, with mean 0. (37) 

You are given the covariance matrix for the observables, but not the full 
covariance matrix. The problem is to say which of the F’s are AT’s and 
which are ringers. I claim this problem is not solvable, because I can 
produce two different theories leading to different classifications of the 
F’s, but having the same joint distribution for the observables. 


1 

I 

I 

I 

I 

I 


I 

I 

I 

I 

I 

I 

I 



variables have mean 0. Let Y = .5 X 1 
AR, and WK arc observable but P 
Fi, Fi, Fj are X's, the remaining F’s 
have been more or less correct, prior t 

Theory 2. Again. I use the covari; 
subtests F| = NO,..., F 7 = GS, toget 
able subtests CS, AS, PC. I create 
independent of the 10 subtests and ha 
these 11 variables is defined to be jo 
mean 0. There arc three additional ur 

Ti = .25(AR + N( 

r, = .25(WK + N 

r, = .75(AR + W 

Let 

Y = + 

In Theory 2, T,, T,, T~ arc the unobse 
auxiliary variables £7, CS, AS, PC serve 

Theory 1 and Theory 2 provide I 
observables. Therefore, no statistical . 
bution—like the SGS algorithms or ar 
adjudicate between the two theories. 

This section and the previous one d 
infer cause and effect relationships b 
matrix, because association is not cat 
ment in SGS avoids such problems onl 
conditions (like faithfulness) on unol 
Sections 11.2 and 12.1. 

In the present section, neither Tlici 
framework; V' is a licterministic functi. 
no stochastic error term; see Eq. i? 
treated as variables rather than errttr t 
tion in Theory 2 i.s, presumably, uni 
comments apply to the previous sectio 


Theory 1. I use the covariance matrix for the seven observable sub- 
tests Fj = NO,..., F 7 = GS, together with the three unobservable sub¬ 
tests, CS, AS, and PC. (The subtests are listed in Table VIII, Section 12.5 
below.) The full distribution is defined to be jointly normal, and all 



12.5. InstUuiiomd Ihtekiiround on the 

The 'vVrmed Scrxiccs Vocationiil . 
subtc.sts, including the seven listed in 'i 
shown in Table VIII. 


PM3006509574 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 









FREEDMAN 


REGRESSION 


101 


lality is determined neither by the data 
jrrelational methods, causality follows 
lobservables. 


that, as a demonstrable mathematical 
right answers: 

icen measured, there \s a stiaightfurward 
; PC, FCr, or other reliable algorithm, and 
leading chapters, to determine which X 
^vhich do not, and for which the question 
ite the dependencies by whatever methods 
‘suits of the previous chapter to obtain 
ating the X variables. No extra theory is 
ilustrations .. > . 

o illustrate this claim is AFQT (Section 
S are exaggerating more than a little, I 

ion with the essential features of the 
question to be undecidable by correla- 
applied to the real example, both SGS 
le right guess.) 

lat X and Y are random variables: X is 

n of .Sf’s, with fixed weights. (35) 
are y and (36) 

igers. (A “ringer” is a variable that does 
m for Y.) There are also unobservables. 
Assume too that 

nultivariate normal, with mean 0. (37) 

trix for the observables, but not the full 
to say which of the V’s are X’s and 
oblem is not solvable, because I can 
ding to different classifications of the 
■ibution for the observables. 

e matrix for the seven observable sub- 
her with the three unobservable sub- 
ts are listed in Table VIII, Section 12.5 
lefined to be jointly normal, and all 



variables have mean 0. Let F = .5 X NO + AR + WK + PC, where NO, 
AR, and WK are observable but PC is unobservable. In this theory, 
Ki,K 2 ,F 3 are X’s, the remaining K’s are ringers. This theory happens to 
have been more or less correct, prior to 1989; see Eq. (42) in Section 12.5. 

Theory 2. Again, I use the covariance matrix for the seven observable 
subtests Vi = NO,- GS, together with the other three unobserv¬ 
able subtests CS, AS, PC. I create an auxiliary variable U, which is 
independent of the 10 subtests and has small variance. The distribution of 
these 11 variables is defined to be jointly normal, and all variables have 
mean 0. There are three additional unobservables, defined as 


Ti - .25(AR + NO) + .5FC + U, (38) 

T, = .25(WK + NO) + .5PC + U, (39) 

Tj = .75(AR + WK) - 2U. (40) 

Let 

+ ETj. (41) 


In Theory 2, TiyT^yT-^ are the unobservables; all the V’s are ringers. The 
auxiliary variables U, CS, AS, PC serve only to define the joint distribution. 

Theory 1 and Theory 2 provide the same joint distribution for the 
observables. Therefore, no statistical procedure based on the joint distri¬ 
bution—like the SGS algorithms or any other correlational methods—can 
adjudicate between the two theories. 

This section and the previous one demonstrate the obvious: you cannot 
infer cause and effect relationships by doing arithmetic on a correlation 
matrix, because association is not causation. The mathematical develop¬ 
ment in SGS avoids such problems only by imposing more or less arbitrary 
conditions (like faithfulness) on unobservable variables, as discussed in 
Sections 11.2 and 12.1. 

In the present section, neither Theory 1 nor Theory 2 fits into the SGS 
framework; F is a deterministic function of the explanatory variables, with 
no stochastic error term: see Eq. (35). Furthermore, if 0 and PC arc 
treated as variables rather than error terms in (38)-(40), the joint distribu¬ 
tion in Theory 2 is, presumably, unfaithful to its causal graph. Similar 
comments apply to the previous section. 

12.5. Institutional Background on the AFQT 

The “Armed Services Vocational Aptitude Battery” (ASVAB) has 10 
subtests, including the seven listed in Table IV, Section 9 above. All 10 arc 
shown in Table VlIl. 


PM3006509575 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 









102 DAVID FREEDMAN 

TABLE vni 
The 10 Subtests in ASVAB 


I. 

Numerical Operations 

NO 

2, 

Word Knowledge 

WK 

3. 

Arithmetical Reasoning 

AR 

4. 

Mathematical Knowledge 

MK 

5. 

Electronics Information 

El 

6. 

Mechanical Comprehension 

MC 

7, 

General Science 

GS 

8. 

Coding Speed 

cs 

9. 

Auto & Shop Information 

AS 

10, Paragraph Comprehension 

PC 


Notes. The first seven were analyzed by 
SGS. ASVAB Fonn 17, July 1990. 


Until January, 1989 the AFQT was computed as 

AFQT = .5 X NO + AR + WK + PC. (42) 

After that date, NO was replaced by MK; a “verbal” score VE was defined 
as VE = WK + PC; and terms were standardized to have mean 0 and 
variance 1 on some calibration data—the “NORC 1985 sample.” AFQT 
was redefined as 

AFQT = MKz + ARz + 2 x VE^, (43) 

where the subscript Z denote.s standardization. Throughout the period, 
raw scores were by Congressional requirement converted to percentiles 
based on the NORC sample. Presumably, the data used by SGS [62] come 
from 1988 or before, since they pick up formula (42) rather than (43); see 
Section 9 above.^^ 


13. RESPONSES 

Formal statistical inference is, by its nature, conditional. If assumptions 
A, B, C,... hold, then H can be tested against the data. However, if 
A, B, C,... remain in doubt, so must inferences about H. Indeed, the 
statistical calculations may prove to be quite misleading. 

Many assumptions are made but only a few are tested. Those made 
without testing are called “maintained hypotheses.” They are usually 

SGS [G2] appear to be considering raw scores, and I follow suit. The material in this 
section was reported by Larry Hanser (personal comuiunication); he refers to Welsh et at [68, 
p. S, Table 3] and Eitelberg [14, p. 73]. 


I 

I 

k 


k 


k 


k 

I 


II 




I 

I 

1 

1 





r 

r 

I 


I 

1 

I 

I 

I 


REGRES.' 

statistical and often rather technical—I 
etc. Careful scrutiny of such assumpti. 
critical part of empirical work. 

In the social sciences, however, stati 
explicit, let alone validated. Questions 
gamut from indignation to obscurant; 
perfect. Linearity has to he a good first 
reasonable. The assumptions do not matt 
You cannot prove the assumptions are tv 
model the biases. fVe are only doing wh 
more sophisticated techniques. What wow 
be belter off with us than without us. We 
model is still a model. 

With the SGS approach, responses ai 
cal. Proponents often seem to take 
justified on the grounds that the exce 
must therefore be viewed as negligil 
approach is frequentist not Bayesian 
finite-state computers, must concentra 
SGS class of models has measure 0 witf 
from my perspective, the whole class ol 
—given the intensity of the research c 
examples. The assumptions that diagr;' 
credibility even further. 

Attempts have also been made to jui 
appeals to continuity. If a covariance m 
parameter values make it faithful. Ho 
turned against correlational methods. I 
is faithful to an incomplete graph, str 
makc the graph complete and vitiate tl 
12.1 points to another kind of instab 
continuity defense (like the Bayesiar 
judgment about modeling styles. Tas 
verification. 

The SGS criteria for causality may 
unlikely that anything could produce 
identified by SGS, other than causatUu 


“ The ••measure” here is the uniform distribi 
volume.... The SGS argument [62, p. 95] seems 
insufficient reason”: see Stigler [64, p. 127]. 


PM3006509576 


Source: https://www.in(dustrydocuments.ucsf.edu/docs/ptgj0001 











11 


D FREEDMAN 


\BLE VIII 
ubtests in ASVAB 

11 

.‘rations NO 

Ige WK 

.easoning AR 

ECnowIedge MK 

ormation El 

imprahension MC 

e GS 

CS 

nformation AS 

iprehension PC 

k k 

k » 
fe 1 

seven were analyzed by 
n 17, July 1990. 

k 1 

was computed as 


O -F AR -h WK H- PC. (42) 

1 1 

.ly MK; a “verbal” score VE was defined 
verc standardized to have mean 0 and 
ata—the “NORC 1985 sample.” AFQT 

k'-i 

-t-ARz-1-2 X VEz, (43) 

ri 

tandardization. Throughout the period, 

1 requirement converted to percentiles 
jmably, the data used by SGS [62] come 
ck up formula (42) rather than (43); see 

r:i 

Ik—— lii 

ESPONSES 

1 -I 

y its nature, conditional. If assumptions 
tested against the data. However, if 
must inferences about H. Indeed, the 

0 be quite misleading. 

ut only a few are tested. Those made 

■ tained hypotheses." They are usually 

rj 

1 

ta 

^ scores, and I foUow suit. The material in this 
irial communication); he refers to Welsh et al. [68, 

r J 


ri 




REGRESSION 103 

statistical and often rather technical—linearity, independence, exogeneity, 
etc. Careful scrutiny of such assumptions would therefore seem to be a 
critical part of empirical work. 

In the social sciences, however, statistical assumptions are rarely made 
explicit, let alone validated. Questions provoke reactions that cover the 
gamut from indignation to obscurantism. We know all that. Nothing is 
perfect Linearity has to be a good first approximation. The assumptions are 
reasonable. The assumptions do not matter. The assumptions are conservative. 
You cannot prove the assumptions are wrong. The biases will cancel. We can 
model the biases. We are only doing what everybody else does. Now we use 
more sophisticated techniques. What would you do? The decision-maker has to 
be better off with us than without us. We all have mental models; not using a 
model is still a model. 

With the SGS approach, responses are more subtle but no more empiri¬ 
cal. Proponents often seem to take a Bayesian stance; faithfulness is 
justified on the grounds that the exceptional cases have measure 0 and 
must therefore be viewed as negligible a priori.^^ However, the SGS 
approach is frequentist not Bayesian; the simulations, being done on 
finite-state computers, must concentrate in a set of iheasure 0; and the 
SGS class of models has measure 0 within larger classes of models. Indeed, 
from my perspective, the whole class of path models seems rather unlikely 
—given the intensity of the research effort and the paucity of convincing 
examples. The assumptions that diagrams are sparse and faithful stretch 
credibility even further. 

Attempts have also been made to justify the faithfulness assumption by 
appeals to continuity. If a covariance matrix is unfaithful, small changes to 
parameter values make it faithful. However, the same argument can be 
turned against correlational methods. For example, if a covariance matrix 
is faithful to an incomplete graph, small changes to hidden parameters 
make the graph complete and vitiate the SGS search procedures. Section 
12.1 points to another kind of instability in the SGS framework. The 
continuity defense (like the Bayesian argument) reflects an aesthetic 
judgment about modeling styles. Taste is no substitute for empirical 
verification. 

The SGS criteria for causality may also be defended as follows—it is 
unlikely that anything could produce the patterns of intcrcorrclation 
identified by SGS, other than causation; thus, correlational methods shift 


The “measure” here is the uniform distribution in Euclidean space, c.g., length, area, 
volume.... The SGS argument [62, p. 95j seems to be a variation on Laplace's “principle of 
insufficient reason": see Stigler [6i, p. 127]. 


PM3006509577 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 




104 


DAVID FREEDMAN 


REGRF 


the burden of argument. Figures 5 and 6 should dispose of this idea. In 
real examples, the patterns identified by the SGS search algorithms can 
hardly represent cause-and-effect relationships. The burden would seem to 
be on the modelers: how can they recommend an algorithm that gives such 
results? 

Proponents of modeling can also be heard to argue that all of us make 
assumptions about unobservables. However, what is unobservable with one 
design may become observable with another. And some investigators still 
deal with unobservables the hard way—by doing the right studies. For 
example, take Fisher’s “constitutional hypothesis”: there may be a genetic 
factor that predisposes you to smoke and to get lung cancer, heart disease, 
etc.^'* This putative genetic factor is the unobservable common cause for 
smoking and illness. 

The epidemiologists did not deal with the constitutional hypothesis by 
introducing special assumptions. Instead, they studied the matter empiri¬ 
cally, using data from twin studies. For a recent report on the Swedish twin 
registry, see Floderus et al. [16]. On the Finnish twin registiy, see Kaprio 
and Koskenvuo [31]. Data on the Danish twin registry are fragmentary. 
There are forthcoming data on the U.S. twin registry, which are quite 
strong [72]. The numbers on lung cancer are suggestive, but still small—this 
is a rare disease, even among smokers. The data on heart disease and total 
mortality, however, make the constitutional hypothesis untenable. 

13.1. A Comment from Judea Pearl 

Judea Pearl (personal communication) writes that 

Correlation-based model-searching schemes produce causal inferences with 
only limited guarantees. Yet such schemes have potential, if conducted under 
conditions that screen out accidental independencies while maintaining struc¬ 
tural independencies—for example, longitudinal studies under slightly varying 
conditions. This assumes, of course, that under such varying conditions the 
parameters of the model will be perturbed, while its structure remains stable. 
Maintaining such delicate balance under changing conditions may be hard in 
real-life studies. However, considering the alternative of resorting to controlled, 
randomized experiments, such longitudinal studies are still an exciting opportu¬ 
nity. 

Additionally, any investigator who is searching for a causal model knowing 
that the parameters might be tied together by some hidden equation, like (17) 
[Section 11.2], is wasting time (and public funds). Such a model, even if correct, 
is bound to be useless, because without the assumption of autonomy (i.e., that 
each parameter can be perturbed without altering the others), the model 
cannot predict the effect of interventions or other changes.... 

Also see Pearl [45]; Pearl and Wermuth [50]. 

“ See SOS [62, pp. 298-299], 


I 

I 

I 

f 

I 

I 

I 

I 

I 

I 

f 

t 

t 

I 


I 

I 

I 

I 

I 

I 

I 

I 

1 

I 

1 

I 

I 

I 


14. OTHER L 

There is an extensive literature on 
at least to the Keynes-Tinbergen e> 
[66]). Also see [37, 38]. For more rece 
the literature, see Freedman [18, 19] 
the basis for inferring causation by u.^ 
53], or [28, 29]. Of enthusiastic viewF 
no shortage; see, for instance, [60], o 
modeling, see Humphreys and Freec 
Pearl [74]. 


15. CONC 

SGS [63] have not succeeded in clai 
causal inferences can be drawn from 
invented a reliable engine for perfor 
some technical interest, but will mak 
tion is assumed in the first place. Tt 
the arrows in a path diagram represc 
and we also assume that the path (.: 
then indeed SGS can infer causatio 
assumption and exact conditional ii 
certain kinds of confounding. Even so 
at the beginning, not proved at the 
causes in, no causes out,”^^ 

The larger problem remains. Cai 
causality by applying statistical techni 
not a mathematical question, becau 
world is put together. As I read the ri 
delivered the goods. We need to W( 
Fancier statistics arc not likely to hcl 


APPENDIX; REGRESSIG 

For ease of reference, this appoi 
computing rcgrc.ssions and conditioi 
sion. Suppose ^ and rj arc random \ 
seek the column vector /3 of rcgri 

Cartwright [.S, Chapj,. 2, 2). Also sec Pearl 


PM3006509578 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 






) FREEDMAN 

5 and 6 should dispose of this idea. In 
Tied by the SGS search algorithms can 
relationships. The burden would seem to 
recommend an algorithm that gives such 

o be hoard to argue that all of us make 
However, what is unobservable with one 
th another. And some investigators still 
1 way—by doing the right studies. For 
mal hypothesis”; there may be a genetic 
kc and to get lung cancer, heart disease, 
is the unobservable common cause for 

al with the constitutional hypothesis by 
nstead, they studied the matter empiri- 
For a recent report on the Swedish twin 
in the Finnish twin registry, see Kaprio 
; Danish twin registry are fragmentary, 
the (J.S. twin registry, which are quite 
ancer are suggestive, but still small—this 
<ers. The data on heart disease and totfil 
.titutional hypothesis untenable. 

arl 

cation) writes that 

schemes produce causal inferences with 
chemes have potential, if conducted under 
al independencies while maintaining struc- 
. longitudinal studies under slightly vaiying 
c, that under such varying conditions the 
rturbed, while its structure remains stable, 
under changing conditions may be hard in 
ng the alternative of resorting to controlled, 
tudinal studies are still an exciting opportu- 

o is searching for a causal model knowing 
ogether by some hidden equation, like (17) 
lublic funds). Such a model, even if correct, 
hout the assumption of autonomy (i.e., that 
I without altering the others), the model 
itions or other changes.... 

rmuth [50]. 


I 

f 

f 

f 

I 

I 

I 

I 



I 

I 

I 

I 




r 

I 

r 

r 


:l 

j 

1 

I 


REGRESSION 105 

14. OTHER LITERATURE 

There is an extensive literature on the evaluation of models, going back 
at least to the Keynes—Tinbergen exchange (Keynes [32, 33]; Tinbergen 
[ 66 ]). Also see [37, 38], For more recent discussions, with other citations to 
the literature, see Freedman [18, 19]. Many authors have tried to explain 
the basis for inferring causation by using regression. See, for example, [52, 
53], or [28, 29]. Of enthusiastic views on social-science modeling, there is 
no shortage; see, for instance, [60], or [2]. For recent discussions of causal 
modeling, see Humphreys and Freedman [73], Cox and Wermuth [ 8 ], or 
Pearl [74]. 


15. CONCLUSIONS 

SGS [63] have not succeeded in clarifying the circumstances under which 
causal inferences can be drawn from observed associations, nor have they 
invented a reliable engine for performing this feat. Their algorithms have 
some technical interest, but will make causal inferences only when causa¬ 
tion is assumed in the first place. To be more explicit: If we assume that 
the arrows in a path diagram represent causation rather than assoeiation, 
and we also assume that the path diagram can be estimated from data, 
then indeed SGS can infer causation from association. The faithfulness 
assumption and exact conditional independence will together eliminate 
certain kinds of confounding. Even so, causality Is assumed into the picture 
at the beginning, not proved at the end. As Nancy Cartwright says, “No 
causes in, no causes out.”^^ 

The larger problem remains. Can quantitative social scientists infer 
causality by applying statistical technology to correlation matrices? That is 
not a mathematical question, because the answer turns on the way the 
world is put together. As I read the record, correlational methods have not 
delivered the goods. We need to work on measurement, design, theory. 
Fancier statistics are not likely to help much. 


APPENDIX; REGRESSION AND CONDITIONING 

For ease of reference, this appendix presents the usual formulas for 
computing regressions and conditional covariances. 1 begin with regres¬ 
sion. Suppose f and 17 are random variables; f may be a row vector. We 
seek the column vector p of regression coefficients for 17 on f. Let 

Cartwright [5, Chaps. 2. 3], Also see Pearl and Verraa [491. 


PM3006509579 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 










106 


DAVID FREEDMAN 


REGRESS 


C = E{^'|} and Z) = E{^'i 7 }; the prime denotes matrix transposition. 
Assume C is positive definite. Then 

j 8 = C-^D. (44) 

Now 77 = fyS + «, where u is automatically orthogonal to f. The mean 
square of u may be computed as follows: 

E(n^) = E(t,2) -/3'C/3. (45) 

If f and Tj have mean 0, then C = cov(^) and D = cov(^, rj); also, 
E(h) = 0. Likewise, if some component of f is a nonzero constant, 
E(«) = 0. If now the variables are jointly normal, u is independent of 
I turn to estimation. Recall Eq. (2), repeated here for ease of reference. 

Y = Xli + €. ( 2 ) 

In this equation, X is the “design matrix,” representing the explanatory 
variables. There is one row for each unit in the study, and one column for 
each variable. The entry in the ith row and )th column represents the ;th 
variable, as observed on the fth unit in the study. X may include a column 
of ones if there is to be an intercept in the equation. F is a column vector 
representing the dependent variable, whose fth component represents the 
value of Y for the /th unit in the study, e is also a column vector, with one 
component for each unit in the study, representing the impact on Y of 
chance factors unrelated to X. Typically, there will be many fewer parame¬ 
ters than data points, so j3 has relatively few components. 

The ordinary least squares estimator for j3 is denoted by a hat and may 
be computed as 

(X’Xy^X’Y. (46) 

The covariance matrix for /§, conditional on the design matrix, is computed 
as 

cov(/§|A') = (X'XyyaT(e,\X). (47) 

Of course, (46) is related to (44); this is seen by defining (f, tj) as a row 
chosen at random from (X,YX 

The “predicted values” and “residuals” are defined as F = X^ and 
e = Y — Y. The residuals are automatically orthogonal to X. The residual 
sum of squares, minimized by the choice of is RSS = l|e||^ = Then 
varfejA') in (47) may be estimated as RSS/(n — p), where n is the 
number of data points and p is the number of explanatory variables. 
Variances will be found along the diagonal of the covariance matrix, and 
the standard error is computed as the square root of the variance. In 


I 

I 

i 

I 

I 

I 


I 

it 

It 



I 

I 

I 

I 

I 




n 

II I 

11 ::i 

i I: I 

\ * 


deriving these formulas, it is assumed t 
are conditionally independent and iden 
Suppose^ the model has an interce 
R} — var{F}/var{F}, where, e.g., 

1 ” _ 
var{F} = - E ()) - 

« i-i 

If all variables have mean 0, then R* = 
The usual formula for computing cc 
sented as follows. Let n > 2. Suppose 
We seek the conditional covariance of 
Let S be the covariance matrix of Xy 
ance of with Xy, X^,X„; let 
Xy, X^,..., X„. Wc view k, and /Cj a; 
conditional covariance is given by 

cov(X,,X2lXy,...,X„) = c 

The prime denotes matrix transpositio 
appendix may be found in standard tex 


ACKNOWLEL 


Many useful comments were made by Dick 13e 
Larry Hanser, Jerome Horowitz, Paul Humphrc> 
McKim, Judea Pearl, Diana Petitti, Jamie Robin- 
Turner. Amos Tversky's work on the paper am 
paper will also appear in a volume of proceech 
Turner, published by Notre Dame Press, 


refere: 

1. L. M. Bartels, InstnimeiUul and “quasi-insi 

777-H(10. 

2. L. M. Barteis ami M. Brady, The state 
“Political Science: The Slate of the Discipline 
Assoc., Washington. DC. 1^)93. 

3. P. M. Blau ami O. D. Duncan, *‘TIie Amo 
York. [967. 

4. J. Cairns. “Cancer: Science and Society/’ Itc 
N. C'arlwriitht, “Nature's Capacriies and Tliei 
19S9. 


PM3006509580 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 








iD FREEDMAN 


REGRESSION 


107 


(le prime denotes matrix transposition, 
len 



= C-'D. (44) 

itomatically orthogonal to The mean 
follows: 

E(t,^) - p'Cli. (45) 

I C = cov(^) and D — cov(i,->]h also, 
mponent of ^ is a nonzero constant, 
e jointly normal, u is independent of 
. (2), repeated here for ease of reference. 

= X/3 + e. (2) 

;n matrix,” representing the explanatory 
ch Unit in the study, and one column for 
h row and fth column represents the /th 
ait in the study. X may include a column 
:pt in the equation. Y is a column vector 
lie, whose ith component represents the 
dudy. £ is also a column vector, with one 
study, representing the impact on Y of 
oically, there will be many fewer parame- 
latively few components, 
iiator for p is denoted by a hat and may 


Y'X)^^X'Y. (46) 

itionai on the design matrix, is computed 

X'X)"'var(eJX). (47) 

this is seen by defining ( f, tj) as a row 

esiduals” are defined as Y = Xp and 
matically orthogonal to X. The residual 
hoice of P, is RSS = \\e\\^ = Then 
ited as RSS/(rt — p), where n is the 
the number of explanatory variables, 
diagonal of the covariance matrix, and 
IS the square root of the variance. In 



il j 
i! I! 
If I 

n 

ir-i 



deriving these formulas, it is assumed that, given X, the components of e 
are conditionally independent and identically distributed, with mean 0 . 
Suppose the model has an intercept. Then may be defined as 
= var{y}/var{y), where, e.g., 

var{y} = i E {Y, - y)^ ? = E 

If ail variables have mean 0, then R' = p'X'Xp/in X var{F}). 

The usual formula for computing conditional covariances may be pre¬ 
sented as follows. Let n > 2. Suppose Xj, X 2 ,...,X„ are jointly normal. 
We seek the conditional covariance of X, and X 2 , given X 3 , X 4 ,..., X„. 
Let 2 be the covariance matrix of X 3 , X 4 , Let ir, be the covari¬ 

ance of X, with X 3 , X 4 ,..., X„; let K 2 be the covariance of X, with 
X-, X 4 ,..., X„. We view atj and K 2 as (n — 2) X 1 column vectors. The 
conditional covariance is given by 

cov(Xi, X2IX3,..., XJ = cov(X4, X2) - k/X- 'k2. ( 48 ) 

The prime denotes matrix transposition. Details on the material in this 
appendix may be found in standard texts, for instance, [54]. 


ACKNOWLEDGMENTS 


Many useful comments were made by Dick Berk, John Cairns, Cliff Clogg, Mark Hansen, 
Larry Hanser, Jerome Horowitz, Paul Humphreys, Ron Lee, Tony Lin, Bill Mason, Vaughn 
McKim, Judea Pearl, Diana Petitti, Jamie Robins, Tom Rothenberg, Terry Speed, and Steve 
Turner, Amos Tversky’s work on the paper amounted to collaboration. A version of this 
paper will also appear in a volume of proceedings, edited by Vaughn McKim and Steve 
Turner, published by Notre Dame Press, 


REFERENCES 

1. L. M. Bartels, Instrumental and “quasi-instrumental” variables, Aimr. I. Pul. Sci. 35 
(1991). 777-SOO. 

2. L. M. Bartels and H. E. Brady, The state of quantitative political methodology, in 
“Political Science; The State of the Discipline 11” (Ada W. Finifter, Ed.), Amer. Pol. Sci. 
Assoc., Washington, DC. 1993. 

3. P. M. Blau and O. D. Duncan, “The American Occupational Structure," Wiley, New 
York, 1967. 

4. J. Cairns, “Cancer; Science and Society,” Freeman, San Francisco, 197!l. 

5. N. Cartwright, "Nature's Capacities and Their Measurement.” Clarendon Press. Oxford, 
1989. 


PM3006509581 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 






108 


DAVID FREEDMAN 


REGREf 


6. C. C. Clogg and A. Haritou, “The Regression Method of Causal Inference and a 
Dilemma with Tliis Method,’^ Technical report, Department of Sociology, Pennsylvania 
State University, 1994. 

7. J. Cornfield, W. Haenszel, E. C. Hammond, A. M. Lilienfeld, M. B. Shimkin, and E. L. 
Wynder, Smoking and lung cancer: Recent evidence and a discussion of some questions, 
/. NaL Cancer Inst. 22 (1959), 173-i03. 

8 . D. R. Cox and N. Wermuth, Linear dependencies represented by chain graphs, Statist 
Set 8 (1993), 204-2S3 (with discussion). 

9. R. Daggett and D. Freedman, Econometrics and the law; A case study in the proof of 
antitrust damages, in L. LeCam and R. Olshen, eds. “Proceedings, Berkeley Conference 
in Honor of Jerzy Neyman and Jack Kiefer,” Vol. I, pp. 126-175, Wadsworth, Belmont, 
CA, 1985. 

10. J. N. Darroch, S. L. Lauritzen, and T. P, Speed, Markov fields and log-linear interaction 
models for contingent^ tables, Ann. Statist. 8 (1980), 522—539. 

11. A. Desrosieres, “La politique des grands nombres,” Editions la Decouverte, Paris, 1993. 

12. O. D. Duncan, “Introduction to Structural Equation Models,” Academic Press, New 
York, 1975. 

13. A. S. C. Ehrenberg and J. A Bound, Predictability and Prediction, J. Roy. Statist. Soc, Ser. 
A 156, Part 2 (1993), 167-206. 

14. M. J. Eitelberg, “Manpower for Military Occupations,” Office of the Assistant Secretary 
of Defense (Force Management and Personnel), Wasliington, DC, 1988. 

15. R. F. Engle, D. F. Hendry, and J. F, Richard, Exogeneity, Econometrica 51 (1983), 
277-304. 

16. B. Floderus, R. Cederlof, and L. Friberg, Smoking and mortality: A 21-year follow-up 
based on the Swedish Twin Registry, Intemat. J. Epidemiology 17 (1988), 332-340. 

17. D. Freedman, A note on screening regression equations, Amer. Statist 37 (1983), 
152-155. 

18. D. Freedman, As others see us: A case study in path analysis, J. Educ. Statist. 12 No. 2 
(1987), (with discussion, whole issue). 

19. D. Freedman, Statistical models and shoe leather, in “Sociological Methodology 1991” (P. 
Marsden, Ed.), Amer. Socioi. Assoc., Washington, DC, 1991. 

20. D. Freedman and D. Lane, “Mathematical Methods in Statistics,” Norton, New York, 
1981. 

21. C. F. Gauss, “Theoria Motus Corporum Coelestium,” Perthes and Besser, Hamburg, 
1809; reprinted by Dover, New York, 1963. 

22. D. Geiger, “Graphoids: A Qualitative Framework for Probabilistic Inference,” Ph.D. 
dissertation, UCLA, Department of Computer Science, 1990. 

23. C. Glymour, A review of recent work on the foundations of causal inference, paper 
presented at the Notre Dame conference, 1993. 

24. C. Glymour, R. Scheines, P. Spirtes, and K. Kelly, “Discovering Causal Structure,” 
Academic Press, New York, 1987. 

25. M. Hakama, M. Lehtinen, P. Knekt, A. Aroraaa, P. Lcinikki, A. Miettinen, J. Paavonen, 
R. Peto, and L. Teppo, Serum antibodies and subsequent cervical neoplasms: A prospec¬ 
tive study with 12 years of follow-up, Amer. J. Epidemiology 137 (1993), 166-170. 

26. J. Hausman, Specification tests in econometrics. Econometrica 46 (1978), 1251-1271. 

27. S. L, Hofferth and K. A. Moore,. Early childbearing and later economic well-being, Amer. 
Soc. Rev. 44 (1979), 784-815. 

28. P. Holland, Statistics and causal inference, J. Amer. Statist. Assoc. 81 (1986), 945-960. 

29. P. Holland, Causa! inference, path analysis, and recursive structural equations models, in 
“Sociological Methodology 1988,” (C. Clogg, Ed.), pp. 449—484, Amer. Socioi. Assoc., 
Washington, DC, 1988. 


\ 

I 

I 

I 

I 

I 

» 


I 

1 

I 

I 

I 

I 

I 



30. International Agency for Research on Cam 
Evaluation of the Carcinogenic Risk of C 
France, 1986. 

31. J. Kaprio and M. Koskenvuo, Twins, smokii 
of smoking-discordant twin pairs. Social Sci. 

32. J. M. Keynes, Professor Tinbergen’s methoi 

33. J. M. Keynes, Comment on Tinbergen’s res 

34. H. Kiiveri and T. Speed, Structural analysis 
cal Methodology 1982," (S. Leinhardt, Ed.). 

35. E. E. Learner, Vector autoregiessiOns for c 
Regimes." (K. Brunner and A. Meltzer, i 
North-Holland, Amsterdam, 1985. 

36. A. M. Legendre, “Nouvelles methodes poi 
Courcier, Paris, 1805; reprinted by Dover, I 

37. T. C. Liu, Under-identification, structural 
(1960), 855-865. 

38. R. E. Lucas Jr., Econometric policy evahn 
Labor Markets,” (K. Brunner and A. Meltzs 
Public Policy, Vol. 1, pp- 19-64, with disco 
Eton., North-Holland, Amsterdam, 1976. 

39. G. S. Maddala, “Introduction to Economci 

40. C. F. Maiiski, identification problems in tin 
1993," (P. V. Marsden, Ed.), pp. 1-56, Blai 

41. P. Meelil, ••Clinical versus Statistical Predit 
the Evidence.” University of Minnesota Pr, 

42. K. A. Moore and S. L. Hofferth, Factors :il 
Ptipui. Enrinm. 3 {1980). 73-98. 

43. J. Pearl. Fusion, propagation and structuri 
241-288. 

44. J. Pearl. "Probabilistic Reasoning in Intclli.r 
C/V 108,8. 

45. J. Pearl, Comment: Graphical models, can 
26f>-273. 

46. J. Pearl, "On the .Statistical interpretation 
Computer Science Department, UCL/\, 19' 

47. J. Pearl. "On the Identification of Norn 
Report. Compuier Seienee lOepartment. I ' 

4.S. J. Fe.irl. D. (ieicer. tind T. Verma. llie 
Diagrams, itelief Nets tinti Decision AnaU 
67-87, Wiley, New York. 1989. 

49. .1. Pearl and f. Verma, A theory of inle 
Represcnlalion ,ind Reasoning: Proeeedo 
<J. A. .Mien. R. likes, tnui I;. Sandessall. 
Mateo. C.\. 1991. 

50. J. Pearl .ind ."S Wermuth, When can asso. 
••proeeciiings. I-’oiirth International tc’ork 
l')9.L" p|), 14 1 150; in ■'.Artilteial Intellu: 
Oliittjrti. I'ds.l. Springer-Verlag. Iteriiil. I"' 

51. R. I’eto and H. /ur H.iusen Cl ds.l, ••X’o.' 
(I.irbnt [.aboiaiory. H.mlniry Report No. 


PM3006509582 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 






REGRESSION 


109 


D FREEDMAN 


Regression Method of Causal Inference and a 
cal report. Department of Sociology, Pennsylvania 

imond, A. M. Lilienfeld, M. B. Shimkin, and E. L. 
ecent evidence and a discussion of some questions, 
3. 

dependencies represented by chain graphs, StatisL 
\). 

metrics and the law: A case study in the proof of 
Olshen, eds. “Proceedings, Berkeley Conference 
Kiefer.” Vol. I, pp. 126-175, Wadsworth. Belmont, 

P. Speed, Markov fields and log-linear interaction 
tatist. 8 (1980), 522-539. 

ids nonibres,” Editions la Decouverte, Paris, 1993. 
■jctural Equation Models,” Academic Press, New 

redictability and Prediction, J. Roy. Statist Soc, Set. 

iry Occupations,” Office of the Assistant Secretaiy 
Personnel), Washington, DC, 1988. 

F. Richard, Exogeneity, Rconometrica 51 (1983), 

>erg. Smoking and mortality: A 21-year follow-up 
'iiierjmt. J. Epidemiology 17 (1988), 332—340- 
regression equations, Amer. Statist. 37 (1983), 

.e study in path analysis, J- Ediic. Statist. 12 No. 2 

oe leather, in “Sociological Methodology 1991” (P. 
Vashington, DC, 1991. 

.atlcal Methods in Statistics,” Norton, New York, 

um Coelestium,” Perthes and Besser, Hamburg, 
963. 

Framework for Probabilistic Inference,” Ph.D. 
imputer Science, 1990. 

'< on the foundations of causal inference, paper 
cc, 1993, 

and KL Kelly, “Discovering Causal Structure,” 

. Aiomaa, P. Leinikki, A. Micttinen, J, Paavonen, 
cs and subsequent cervical neoplasms: A prospec- 
imer. J. Epidemiology 137 (1993), 166-170, 
ometrics, Economelrica 46 (1978), 1251-1271. 
.'hildbearing and later economic well-being, Amer. 

nee, J. Amer. Statist Assoc, 81 (1986), 945-960. 
ysis, and recursive structural equations models, in 
"ciogg, Ed.), pp. 449-4S4, Amer. Sociol. Assoc., 



30. International Agency for Research on Cancer, “Tobacco Smoking,” Monographs on the 
Evaluation of the Carcinogenic Risk of Chemicais to Humans, Vol. 38, lARC, Lyon, 
France, 1986. 

.31. J. Kaprio and M. Koskenvub, Twins, smoking and mortality: A 12-year prospective study 
of smoking-discordant twin pairs. Social Set. Med. 29 (1989), 1083—1089. 

32. J. M. Keynes, Professor Tinbergen’s method, Eeoii. J. 49 (1939), 558-570. 

33. J. M. Keynes, Comment on Tinbergen’s response, Econ. J. 50 (1940), 154-156. 

34. H. Kiiveri and T. Speed, Structural analysis of multivariate data; A review, in “Socioiogi- 
cal Methodology 19S2,” (S. Leinhardt, Ed,), Jossey Bass, San Francisco, 1982. 

35. E. E. Learner, Vector autoregressions for causal inference, in “Understanding Monctaiy 
Regimes," (K. Brunner and A. Meltzer, Eds.); supplement to the }. Monetary Econ., 
North-Holiand, Amsterdam, 1985. 

36. A. M. Legendre, “Nouvelles methodes pour la determination des orbites des cometes, 
Courcier, Paris, 1805; reprinted by Dover, New York, 1959. 

37. T. C. Liu, Under-identification, structural estimation, and forecasting, Econometrica 28 

(1960), SS5-865. 

38. R. E. Lucas Jr., Econometric policy evaluation: A critique, in “The Phillips Curve and 
Labor Markets,” (K. Brunner and A. Meltzer, Eds.), Camegie-Rochester Conferences on 
Public Policy, Vol. 1, pp. 19-64, with discussion, supplementary series to the 1. Monetary 
Econ., North-Holiand, Amsterdam, 1976. 

39. G. S. Maddala, “Introduction to Econometrics,'’ 2nd ed., McGraw-Hill, New York, 1992. 

40. C. F. Manski, Identification problems in the social sciences, in “Sociological Methodology 
1993,” (P. V. Matsden, Ed.), pp. 1-56, Blackwell, Oxford, 1993. 

41. P. Meehl, “Clinical versus Statistical Prediction; A Theoretical Analysis and a Review of 
the Evidence,” University of Minnesota Press, Minneapolis, 1954. 

42. K. A. Moore and S. L. Hofferth, Factors affecting early family formation; A path model, 
Popul. Ertviron. 3 (1980), 73—98. 

43. J. Pearl, Fusion, propagation and structuring in belief networks, Artif. Intell. 29 (1986), 
241-288. 

44. J, Peari, “Probabilistic Reasoning in Intelligent Systems,” Morgan Kaufmann, San Mateo, 
CA, 1988. 

45. J. Pearl, Comment: Graphical models, causality and intervention. Statist. Set. 8 (1993), 
266-273. 

46. J. Pearl, “On the Statistical Interpretation of Structural Equations,” Technical Report, 
Computer Science Department, UCLA, 1994a. 

47. J. Pearl, “On the Identification of Nonparametric Structural Equations,” Technical 
Report, Computer Science Department, UCLA, 1994b. 

48. J. Pearl, D. Geiger, and T, Verma, The logic of influence diagrams, in “Influence 
Diagrams, Belief Nets and Decision Analysis,” (R. M. Oliver and i. Q. Smith, Eds.), pp. 
67-87, Wiley, New York, 1989. 

49. J. Pearl and T. Verma, A theory of inferred causation, in “Principles of Knowledge 
Representation and Reasoning: Proceedings of the Second International Conference 
(J. A. Allen, R. Pikes, and E. Sandewall, Eds.), pp. 441-452, Morgan Kaufmann, San 
Mateo, CA, 1991. 

50. J. Peari and N. Wermuth, When can association graplis admit a causal explanation? in 
“Proceedings, Fourth International Workshop on Artificial Intelligence and Statistics, 
1993,” pp. 141-150; in “Artificial Intelligence and Statistics” (F. Cheeseman and W. 
Oidford, Eds.), Springer-Verlag, Berlin, 1994. 

51. R. Peto and H. zur Hausen (Eds.), “Viral Etiology of Cervical Cancer,” Cold Spring 
Harbor Laboratory, Banbury Report No. 21, 1986. 


PM3006509583 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 
















no 


DAVID FREEDMAN 


INFOHMATION F 


52. J. Pratt and R, Schlaifer, On the nature and discovery of structure, J. Amen Statist. Assoc. 
79 (1984), 9-21. 

53. J. Pratt and R. Schlaifer, On the interpretation and observation of laws, J. Econ. 39 
C1988), 23-52. 

54. C, R. Rao, “Linear Statistical Ittf^rence and Its Applications,” 2nd ed., Wiley, New York, 
1973. 

55. R. R. Rindfuss, L. Bumpass, and C. St. John, Education and fertility; Implications for the 
roles women occupy. Amen Social. Rev. 45 (1980), 431-447. 

56. R. R. Rindfuss, L. Bumpass, and C. St. John, Education and the timing of motherhood: 
Disentangling causation, J. Marriage Family 46 (1984), 981—984. 

57. E. Seneta, Discussion, J. Educ. Statist. 12 (1987), 198-201, 

58. K, J. Sherman, J. R. Daling, J. Chu, et al Genitai warts, other sexually transmitted 
diseases, and vulvar cancer, Epidemiology 2 (1991), 257-282. 

59. H. Simon, The meaning of causai ordering, in “Qualitative and Quantitative Social 
Research,” (R. K. Merton, J. S. Coleman, and F. H, Rossi, Eds.), pp. 65-81, Free Press, 
New York, 1980. 

60. N. J. Smelser and D. R. Getstein, “Behavioral and Social Science: Fifty Years of 
Discovery,” National Academy Press, Washington, DC, 1986. 

61. T, P. Speed and H. T. Kiiveri, Gaussian Markov distributions over finite graphs. Ann. 
SlatisL 14 (198S), 138-150. 

62. P. Spirtes, C. Glymour, and R. Scheines, “Causation, Prediction and Search,” Lecture 
Notes in Statistics, Vol. 81, Springer-Verlag, New York/Berlin. 1993. 

63. P. Spirtes, R. Scheines, C. Glymour, and C. Meek, “TITRAD II,” Documentation for 
Version 2.2, Technical Report, Department of Philosophy, Qjrnegie Mellon University, 
Pittsburgh, PA, 1993. 

64. S. Stigler, “The History of Statistics,” Harvard University Press, Boston, 1986. 

65. M. Timberlake and K. Williams, Dependence, political exclusion and government repres¬ 
sion: Some cross national evidence, Amer. Social. Ren. 49 (1984), 141-146. 

66. J. Tinbergen, “Reply to Keynes.” Econ. J. SO (1940), 141-154. 

67. T. Verma and J. Pearl, “Causal Networks; Semantics and Expressiveness.” in “Uncer¬ 
tainly in AI 4” (R. Shachter, T. S. Levitt, and L N. Kanal, Eds.), pp. 69-76, Elsevier 
Science, Amsterdam, 1990. 

68. J. R. Welsh, S. K. Kucinkas, and L. T. Curran, “Armed Services Vocational Battery 
(ASVAB): Integrative Review of Validity Studies,” Air Force Human Resources Labora¬ 
tory Report AFHRL-TR-90-22. 1990. 

69. H. White, A heteroskedasticity-consistent estimator and a direct test for heteroskedastic- 
ity, Ecanametrica 48 (1980), 817-838. 

70. H. White, Maximum likelihood estimation of tnisspecified models, Econometrica 50 
(1982), 1-25. 

71. G. U. Yule, An ruvestigation into the causes of changes in pauperism in England, chiefly 
during the last two intercensa! decades, /. Roy. Statist. Sac. 62 (1989), 249-295. 

72. D. Carmelli and W. F. Page, Twenty-four year mortahty in World War II US male 
veteran twins discordant for cigarette smoking. International Journal of Epidemiology 25 
(1996), 554-559. 

73. P. Humphreys and D. Freedman, The grand leap, Br. J, Phi. Sci. 47 (1996), 113-123. 

74. J. Pearl, Causai diagrams for empirical research, Biometrika. 82 (1995), 669—710 (with 
discussion). 

75. N. Munoz, F. X. Bosch, K. V. Shah, A. Meheus, (eds.) “The Epidemiology of Human 
Papillomavirus and Cervical Cancer” International Agency for Research on Cancer, 
Lyon. Distributed in the U.S.A. by Oxford University Press, 1992. 




K'-l 

. 1 . 


Advances in Applied Mathematics will 
mathematics. Particular regard will be giv 
advance in their field, and which are stylis 
papers be preceded by an introduction o- 
relevance of the results presented. Well 
published. 

Submission of Manuscripts. Clai 
liness of the contents are the prime cri 
publication. Original papers only will be i 
review with the understanding that the sai 
is presently submitted elsewhere, and tha 
approved by all of the authors and by th 
out; further, that any person cited as a 
approved such citation. Written author! 
discretion. Articles and any other mat 
Mathematics represent the opinions of the 
reflect the opinions of the Editor(s) and tl 
Papers may be submitted to the Edit 
Institute of Technology, Department of 
02139, or to any member of the Editorial 
Authors submitting a manuscript do so 
for publication, copyright in the article, ii 
in all forms and media, shall be assigned i 
will not refuse any reasonable request by t 
of his or her contributions to the journal. 

Manuscripts should he prepared accordbt 
these rules causes publication delays)'. 

Form of Manuscript. Submit ma 
nal typewritten copy (preferably triple-sp: 
side of 8.5 X 11 inch white paper. Num 
article title, author and coauthor narm 
institution, city, state, and zip code). At tl 
the title (indicated by superscript *). 
head (abbreviated form of the title) of les 
the name and mailing address of the auth 
Abstract. The inclusion of an abst 
abstract, it should be typed on page 3 anc 
Li.st of Symbols. Attach to the i 
symbols, identified typographically, not i 
print but is essential in order to avoid 
equations are handwritten in the text 
handwritten.) Distinguish between "oh 
upper- and lowercase “kay”; etc. Indl 
(German, Greek, vector, scalar, script, el 


PM3006509584 


Source: https://www.industrydocuments.ucsf.edu/docs/ptgj0001 


















